Open Access

Automated Quality Assessment of Medical Images in Echocardiography Using Neural Networks with Adaptive Ranking and Structure-Aware Learning

Gadeng Luosang

https://orcid.org/0009-0009-1873-3812

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China

College of Information Science and Technology, Tibet University, Lhasa 850000, P. R. China

E-mail Address: luosanggadeng@stu.scu.edu.cn

Search for more papers by this author

Zhihua Wang

https://orcid.org/0000-0003-0355-903X

College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, P. R. China

Anhui Kunlong Kangxin Medical, Technology Company Limited, Anhui 230000, P. R. China

E-mail Address: zhihua_wang@zju.edu.cn

Search for more papers by this author

Jian Liu

https://orcid.org/0000-0001-5148-5069

Department of Ultrasound, Clinical Medical College, The First Affiliated Hospital of Chengdu Medical College, Chengdu 610599, P. R. China

E-mail Address: liujiansh@126.com

Search for more papers by this author

Fanxin Zeng

https://orcid.org/0000-0002-7337-4463

Department of Clinical Research Center, Dazhou Central Hospital, Sichuan 635099, P. R. China

E-mail Address: zengfx@pku.edu.cn

Search for more papers by this author

Zhang Yi

https://orcid.org/0000-0002-5867-9322

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China

E-mail Address: zhangyi@scu.edu.cn

Search for more papers by this author

, and

Jianyong Wang

https://orcid.org/0000-0003-1689-2384

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China

E-mail Address: wjy@scu.edu.cn

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S0129065724500540Cited by:0 (Source: Crossref)

Abstract

The quality of medical images is crucial for accurately diagnosing and treating various diseases. However, current automated methods for assessing image quality are based on neural networks, which often focus solely on pixel distortion and overlook the significance of complex structures within the images. This study introduces a novel neural network model designed explicitly for automated image quality assessment that addresses pixel and semantic distortion. The model introduces an adaptive ranking mechanism enhanced with contrast sensitivity weighting to refine the detection of minor variances in similar images for pixel distortion assessment. More significantly, the model integrates a structure-aware learning module employing graph neural networks. This module is adept at deciphering the intricate relationships between an image’s semantic structure and quality. When evaluated on two ultrasound imaging datasets, the proposed method outshines existing leading models in performance. Additionally, it boasts seamless integration into clinical workflows, enabling real-time image quality assessment, crucial for precise disease diagnosis and treatment.

Keywords:

1. Introduction

Deep learning has significantly improved image processing, especially in computer vision tasks such as image recognition and classification. With its efficient feature extraction and pattern recognition capabilities, deep learning technology has found widespread applications in medical image processing. By utilizing deep learning algorithms, medical image processing can achieve precise lesion detection, organ segmentation, and diagnostic support, thereby greatly enhancing the accuracy and efficiency of medical diagnosis. However, the quality of medical images directly affects the performance of these algorithms, and evaluating medical image quality is a crucial step in ensuring diagnostic reliability.

High-quality medical images are indispensable for precise diagnoses and effective treatments across various diseases.¹ To ensure the reliable performance of medical images in clinical applications, automated evaluation methods have been proposed for various image types, including those from echocardiography (Echo),² skin ultrasound,³ craniomaxillofacial,⁴ and pelvic lymph node detection.⁵

Prevailing methodologies often focus on assessing image quality through pixel distortion, sidelining the evaluation of an image’s semantic structure visibility. An obvious example is the quality assessment of Echo. Echo is an extensively utilized and cost-effective diagnostic tool for detecting cardiac diseases.⁶ Among the various views in Echo, the apical four-chamber (A4C) view holds paramount importance in diagnosing fetal heart disease⁷ and congenital heart disease.⁸ However, obtaining an accurate A4C view for diagnostic purposes is complex, as shown in the Fig. 1. In this illustration, (a) presents a high-resolution image with abundant pixels, but the cardiac structure is not fully visible, whereas (b) is a lower-resolution image compromising on pixel count but offering a more comprehensive view of the cardiac structure. Sonographers typically prefer (B) for diagnosis, highlighting the importance of considering semantic structure visibility alongside pixel-level assessment to enhance medical image quality.

Fig. 1. A typical A4C echocardiogram. (a) A high-resolution image, yet the structure is not fully visible; (b) a low-resolution quality but provides a complete view of the structure.

Obtaining the best diagnostic views requires extensive clinical experience and a deep understanding of anatomy, presenting a significant challenge, especially for novice doctors.⁹ In current clinical practice, evaluating medical imaging quality by novices is often subjective and time-consuming, necessitating the support of experienced physicians.¹⁰ Consequently, there is an urgent need for an objective and automated image quality assessment (IQA) method. Such a method should swiftly identify standard views, facilitate accurate diagnoses, and support sonographers and doctors in their clinical tasks, particularly in cardiac ultrasound examinations.

In pixel distortion evaluation methodologies, the task is commonly approached as either a regression or classification problem, with convolutional neural networks (CNNs) being the prevailing underlying architecture.^11,12 In regression-based methods, the objective is to map image quality to a specific numerical score. For example, Zhang et al.¹³ employed transfer learning to assess ultrasound image quality by training a model on optical image quality data. Lin et al.¹⁴ concentrated on standard planar acquisition for fetal head ultrasound, while Czajkowska et al.³ explored quality assessment methods for high-frequency ultrasound datasets. Conversely, in classification-based approaches, images are classified into different quality grades based on clarity, and a classification model is trained for quality grade categorization. For instance, Abdi et al.¹⁵ and Chen et al.¹⁶ utilized CNN to classify cardiac ultrasound images into different quality levels. Additionally, Dong et al.¹⁷ and Zhang et al.¹⁸ developed a quality scoring model specifically for the A4C view by integrating multiple networks and employing data augmentation techniques such as image gain and scaling. This classification approach establishes a systematic framework for categorizing images, facilitating the identification of varying levels of image quality based on predefined criteria such as clarity and resolution. However, accurately assessing the quality differences between similar images poses a challenge in pixel evaluation methods, as they often share similar visual features.¹⁹ This similarity hinders effectively capturing the unique attributes that determine image quality.

Despite advancements in image evaluation methods focusing on pixel distortion, these approaches often neglect the critical assessment of basic anatomical structures’ clarity, which is vital for accurate diagnoses. Our study introduces a novel neural network model to evaluate pixel and semantic distortions comprehensively. Pixel distortion measurement concentrates on the clarity and sharpness of images, while semantic distortion assessment aims to determine the accuracy with which basic anatomical structures are captured. To evaluate pixel distortion, the model employs an adaptive ranking mechanism. This mechanism learns to assign quality scores to similar images based on contrast sensitivity weighting factors, offering a nuanced understanding of image clarity. Furthermore, the model incorporates a structure-aware learning approach to assess semantic distortion. Assessing semantic distortion involves leveraging a graph neural network (GNN) to capture the intricate relationship between image structures and image quality, ensuring a more holistic and accurate evaluation.

In summary, this paper has developed a comprehensive IQA model for ultrasound by combining multiple deep neural networks and IQA methods. Additionally, we have created an intelligent system based on this model. We have made our implementation publicly available to support reproducibility and further research. The code can be accessed at https://github.com/gaden168/MIQA_A4C.

Our main contributions are summarized as follows:

•	Developed a method that integrates a dynamic ranking module with contrast sensitivity weighting to enhance the learning of distinguishable features in similar images, leading to a more accurate assessment of pixel distortions.
•	Proposed using GNNs for structure-aware learning, demonstrating their effectiveness in understanding the relationship between image quality and semantic structures. This facilitates a precise assessment of semantic distortions in images.
•	Introduced a multi-task learning framework that concurrently measures pixel and semantic distortions for IQA. Validation on a real-world cardiac ultrasound dataset and two public datasets demonstrated competitive performance compared to state-of-the-art methodologies.
•	Utilized the proposed method to design and implement an intelligent automatic evaluation system for cardiac ultrasound. This novel approach shows excellent potential for future integration into clinical practice.

The structure of this paper is outlined as follows. Section 2 presents a review of related works, Secs. 3 and 4 outline the proposed method and experimental results in detail. The system design and implementation are discussed in Sec. 5. Finally, the conclusions will be summarized in Sec. 6.

2. Related Work

Over the past decade, deep learning has rapidly become one of the core technologies in artificial intelligence, demonstrating exceptional performance across various applications. Particularly in medical image processing, its efficient feature extraction and pattern recognition capabilities have significantly enhanced the accuracy and efficiency of medical diagnostics. As an essential aspect of ensuring diagnostic reliability, medical IQA has received extensive attention and research in recent years.

In the field of medical IQA, numerous methods have been proposed over the past decades to ensure the reliable performance of images in clinical tasks. These IQA methods can be broadly categorized as traditional and deep learning methods.

Traditional methods for IQA: Traditional IQA methods encompass techniques that rely on metrics like peak signal-to-noise ratio (PSNR)²⁰ and mean square error (MSE).²¹ These techniques have been widely utilized in the initial phases for the automated assessment of the quality of medical images.²² While traditional methods yield reproducible IQA measurements, they necessitate manual feature engineering, which can be time-consuming and resource-intensive.²³

Deep learning methods for IQA: With the advancements in deep learning techniques, CNNs have emerged as the main methods in IQA tasks.²⁴ These approaches treat quality prediction as a regression or classification problem and aim to automatically learn features for quantifying IQA from labeled data, eliminating the need for manual feature design. For instance, Kang et al.²⁵ pioneered this approach by integrating CNNs into the IQA task. Since then, several algorithms such as Yang et al.²⁶ introduced SGDNet, an end-to-end saliency-guided deep neural network that enhances assessment accuracy without reference images. Chen et al.²⁷ proposed hyperIQA, improving the evaluation of light field sub-aperture images. Cheon et al.²⁸ developed a perceptual IQA method using transformers, leveraging their feature extraction and attention mechanisms for robust IQAs.

Improving the quality of medical images is crucial in maintaining the accuracy of clinical diagnoses.²⁹ Recently, deep learning has also been applied to assess the quality of medical images. For instance, Abdi et al.¹⁵ developed a CNN-based A4C view quality assessment model to classify ultrasound images into different quality levels. Zhang et al.¹⁸ specifically designed a quality scoring model for A4C views, treating the quality assessment task as a regression problem. They compared their approach with conventional Lasso regression³⁰ and Elastic Net³¹ methods, achieving a lower absolute distance error (ABE). Hossain et al.³² proposed an automated fetal ultrasound IQA scheme using CNNs to aid ultrasound image quality control in clinical obstetric examinations. Additionally, endeavors have been made to employ optical IQA techniques for evaluating the quality of ultrasound images, as demonstrated by methods such as Saeed et al.²³ who used meta-reinforcement learning for adaptive quality assessment tailored to machine learning tasks. Huh et al.³³ introduced tunable quality control for three-dimensional (3D) ultrasound using switchable CycleGAN, enabling dynamic adjustments. Saeed et al.³⁴ focused on enhancing IQA through task-amenable data selection. Golestaneh et al.¹¹ developed a no-reference IQA method with transformers, relative ranking, and self-consistency.

Semantic structure analysis: Furthermore, in medical IQA, diagnosis must consider not only image clarity but also whether the semantic structure in the image is fully visible. Liu et al.³⁵ delved into a multi-task-based pulmonary nodule analysis approach to detect anatomical structures within the image and obtain a standardized slice for diagnostic purposes. They employed a method called MF R-CNN to detect structures in ultrasound images, considering each anatomical structure as an individual target for detection and disregarding the correlation between anatomical structures. However, it is worth noting that multiple anatomical structures often appear simultaneously in a single ultrasound image. Consequently, accurately capturing the correlation between these different structures becomes crucial.

To evaluate the quality of medical images comprehensively, it is imperative to consider the image’s clarity and whether the basic anatomical structures are visible. This holistic assessment approach ensures a more comprehensive understanding of image quality in the context of medical image analysis.

3. Methodology

This paper presents a novel comprehensive IQA method focusing on pixel and semantic distortion measurement. The approach incorporates an adaptive ranking mechanism that utilizes contrast-sensitive weighting factors to assign quality scores to similar images, ensuring accurate pixel distortion measurement. Additionally, the method integrates a structure-aware learning model with a GNN to capture the relationship between image structure and quality, facilitating precise semantic distortion measurement. Furthermore, this study introduces a multi-task framework to handle both tasks concurrently. Each of these components is detailed extensively in the following three sections.

3.1. Multi-task learning

The proposed methodology adopts a multi-task learning framework to comprehensively evaluate image quality by simultaneously considering pixel and semantic distortion measurements. As depicted in Fig. 2, the framework consists of two branches. The first branch employs a shallow regression model that maps image features to quality scores. It incorporates an adaptive ranking mechanism, utilizing a contrast sensitivity weighting factor to assign quality scores to similar images. This mechanism enhances the exploitation of heterogeneity among similar images, thereby improving the accuracy of pixel distortion estimation. The second branch utilizes structure-aware learning with a GNN to perform semantic structure classification of images. This branch’s graph convolutional networks (GCNs) model captures the intricate relationship between image structures and image quality, enabling precise semantic distortion evaluation. The final IQA score is determined by aggregating the quality scores from both branches.

Fig. 2. Schematic diagram of the proposed network. The proposed network is a multi-task learning framework with three main modules: feature extraction, pixel distortion measurement, and semantic distortion measurement. The feature extraction module includes a residual network backbone and n conv-blocks. The pixel distortion measurement branch learns a shallow regression model and introduces an adaptive ranking learning to assign quality scores to similar images according to a contrast sensitivity weighting factor. The semantic distortion measurement branch through the structure-aware learning model incorporates a GNN to capture the relationship between heart structure and image quality. The proposed method combines these two approaches to comprehensively evaluate cardiac ultrasound A4C image quality.

This paper uses $S_{pix}$ $S_{pix}$ and $S_{sem}$ $S_{sem}$ to denote the quality scores of the pixel and semantic distortion measurement branches, respectively. The final image quality score $S_{total}$ $S_{total}$ is obtained by computing the arithmetic mean of the quality scores of the two branches of the multi-task framework

Stotal=12(Spix+Ssem).Stotal=12(Spix+Ssem).<math display="block" altimg="eq-00004.gif"><msub><mrow><mi>S</mi></mrow><mrow><mstyle><mtext mathvariant="normal">total</mtext></mstyle></mrow></msub><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo stretchy="false">(</mo><msub><mrow><mi>S</mi></mrow><mrow><mstyle><mtext mathvariant="normal">pix</mtext></mstyle></mrow></msub><mo>+</mo><msub><mrow><mi>S</mi></mrow><mrow><mstyle><mtext mathvariant="normal">sem</mtext></mstyle></mrow></msub><mo stretchy="false">)</mo><mo>.</mo></math>(1)

Our proposed method employs two independent but complementary tasks to optimize the model’s performance. The loss function of the model is defined as follows :

ℒ total = γ * ℒ pix + (1 - γ) * ℒ sem, <math display="block" altimg="eq-00005.gif"><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">total</mtext></mstyle></mrow></msub><mo>=</mo><mi>γ</mi><mo>*</mo><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">pix</mtext></mstyle></mrow></msub><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><mi>γ</mi><mo stretchy="false">)</mo><mo>*</mo><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">sem</mtext></mstyle></mrow></msub><mo>,</mo></math> (2)

where

$ℒ_{pix}$ represents the adaptive ranking learning loss, which consists of a regression loss and a ranking loss that will be described in detail in the next section. Its optimization target is the performance of the pixel distortion measurement branch. The

$ℒ_{sem}$ represents the structure-aware learning loss, and the optimization goal is the performance of the semantic distortion measurement branch. The parameter

$γ$ represents the weight to balance the semantic distortion measurement and pixel distortion measurement module.

The feature extraction module in our model uses ResNet50³⁶ as the backbone. ResNet50 is an efficient and commonly used network for feature extraction in computer vision.³⁷ In this paper, we removed the last global pooling and fully connected layers and added conv-blocks to reduce the feature dimension. The exact number of conv-blocks to be added is uncertain, so we experiment with two different architectures: B₁ and B₂. B₁ represents the model with one additional conv-block, while B₂ represents the model with two additional conv-blocks.

3.2. Pixel distortion measurement

Accurately assessing the quality of similar images presents a challenge due to their shared visual characteristics, hindering the capture of unique properties that define individual image quality. Learning to rank emerges as a promising approach to address this, enabling quality inference through partial order relations.³⁵ However, traditional ranking-based methods often struggle to effectively prioritize similar images.

In response, this paper proposes an adaptive rank method that specifically targets similar images, aiming to enhance their contrast sensitivity and improve the accuracy of evaluating their quality. Our method begins by taking a two-dimensional (2D) image as input and extracting features using a dedicated feature extraction layer. These features traverse two fully connected layers, utilizing MSE loss to train a shallow regression model mapping image features to a specific quality score.

Simultaneously, the method incorporates an adaptive rank (a-rank) loss, depicted in the light blue section of Fig. 2. This loss function is designed to capture the heterogeneity between similar images, enhancing contrast sensitivity and improving quality prediction performance. The IQA based on pixel distortion measurement is achieved by combining MSE loss and a-rank loss into a joint loss. To simplify the model and avoid the need for additional hyperparameters, both losses are weighted equally

ℒ pix = ℒ MSE + ℒ a-rank, <math display="block" altimg="eq-00009.gif"><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">pix</mtext></mstyle></mrow></msub><mo>=</mo><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">MSE</mtext></mstyle></mrow></msub><mo>+</mo><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">a-rank</mtext></mstyle></mrow></msub><mo>,</mo></math> (3)

where the adaptive ranking learning mechanism is defined as:

$ℒ_{a-rank} = ℒ_{rank} * θ (z_{i}, z_{j})$ , the adaptive ranking learning mechanism comprises two parts. The first part focuses on learning the heterogeneity among similar images from ranking. The second part aims to prioritize similar images. In the subsequent sections, we will provide an in-depth description of the detailed implementation of these two components.

Adaptive ranking learning: Learning image quality ranking can capture and exploit the heterogeneity of similar images, thereby improving the accuracy of shallow regression models. In particular, when provided with a dataset consisting of N training samples ${x_{i}, t_{i}}_{i = 0}^{N - 1}$ , where $x_{i}$ represents the ith image and $t_{i}$ denotes the corresponding ground truth quality score.

We employ the following loss function to learn the rank relationship between images :

ℒrank=12N∑i,jmax(0,m−δ(pi,pj)∗(ti−tj)),<math display="block" altimg="eq-00014.gif"><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">rank</mtext></mstyle></mrow></msub><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn><mi>N</mi></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><mo>max</mo><mo stretchy="false">(</mo><mn>0</mn><mo>,</mo><mi>m</mi><mo>−</mo><mi>δ</mi><mo stretchy="false">(</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">)</mo><mo>∗</mo><mo stretchy="false">(</mo><msub><mrow><mi>t</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><msub><mrow><mi>t</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo></math>(4)

δ (p i, p j) = {1 if p i \geq p j, - 1 if p i < p j, <math display="block" altimg="eq-00015.gif"><mi>δ</mi><mo stretchy="false">(</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">)</mo><mo>=</mo><mfenced separators="" open="{" close=""><mrow><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="left"><mn>1</mn><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">if</mtext></mstyle><mspace width=".17em" class="nbsp"></mspace><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>\geq</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>,</mo></mtd></mtr><mtr><mtd columnalign="left"><mo>-</mo><mn>1</mn><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">if</mtext></mstyle><mspace width=".17em" class="nbsp"></mspace><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>&lt;</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>,</mo></mtd></mtr></mtable></mrow></mfenced></math> (5)

where

$p_{i}, p_{j}$ represent the predicted quality score for the ith and jth samples, respectively.

$t_{i}, t_{j}$ represent the target quality scores for the ith and jth samples, respectively. The

$δ (p_{i}, p_{j})$ is the indicator function. The m is the margin parameter. We assume the margin parameter is fixed (e.g.

$m = 0$ ). If the ranking between two ground truth scores is the same as between two predicted scores, the loss is 0. Otherwise, the loss is penalized in the training process.

The incorporation of learning to rank undoubtedly enhances model performance. However, to further guide the model’s attention toward similar images and enhance their contrast sensitivity, we propose the implementation of adaptive ranking learning. This approach introduces a contrast-sensitive weighting factor based on rank learning, enabling the model to prioritize and emphasize the distinguishing characteristics of similar images. The contrast-sensitive weighting factors are defined as follows :

θ (z i, z j) = {1 - cos (z i - z j) if | t i - t j | < η, max (0, cos (z i - z j) - m) if | t i - t j | \geq η, <math display="block" altimg="eq-00020.gif"><mi>θ</mi><mo stretchy="false">(</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">)</mo><mo>=</mo><mfenced separators="" open="{" close=""><mrow><mtable displaystyle="true" equalrows="false" equalcolumns="false"><mtr><mtd columnalign="left"><mn>1</mn><mo>-</mo><mo>cos</mo><mo stretchy="false">(</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>-</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">)</mo><mspace width="1em" class="quad"></mspace></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">if</mtext></mstyle><mspace width=".17em" class="nbsp"></mspace><mi>|</mi><msub><mrow><mi>t</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>-</mo><msub><mrow><mi>t</mi></mrow><mrow><mi>j</mi></mrow></msub><mi>|</mi><mo>&lt;</mo><mi>η</mi><mo>,</mo></mtd></mtr><mtr><mtd columnalign="left"><mo>max</mo><mo stretchy="false">(</mo><mn>0</mn><mo>,</mo><mo>cos</mo><mo stretchy="false">(</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>-</mo><msub><mrow><mi>z</mi></mrow><mrow><mi>j</mi></mrow></msub><mo stretchy="false">)</mo><mo>-</mo><mi>m</mi><mo stretchy="false">)</mo><mspace width="2em" class="qquad"></mspace></mtd><mtd columnalign="left"><mstyle><mtext mathvariant="normal">if</mtext></mstyle><mspace width=".17em" class="nbsp"></mspace><mi>|</mi><msub><mrow><mi>t</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>-</mo><msub><mrow><mi>t</mi></mrow><mrow><mi>j</mi></mrow></msub><mi>|</mi><mo>\geq</mo><mi>η</mi><mo>,</mo></mtd></mtr></mtable></mrow></mfenced></math> (6)

where

$z_{i}$ and

$z_{j}$ denote the image features of the ith and jth samples, respectively.

$| t_{i} - t_{j} |$

$| t_{i} - t_{j} |$ represents the absolute difference between the target values of sample ith and sample jth. The

$η$ is a threshold that represents the boundary of the difference between target values

$t_{i}$ and

$t_{j}$ . The

$η$ parameter is fixed, indicating that the images are similar when the ground trues image quality score difference between two images is less than

$η$ . The larger the

$θ$ , the more emphasis is placed on the relative ranking of those two similar images during the learning process.

3.3. Semantic distortion measurement

In clinical settings, the precision of diagnoses relies on the perceptual clarity of medical images and the unambiguous visibility of semantic structures inherent to the visual data. Unfortunately, contemporary medical IQA methodologies often neglect the critical evaluation of semantic distortion within images. To address this issue, this paper introduces an innovative structure-aware learning model seamlessly incorporating a GNN. This advanced model facilitates exploring the relationships between image structures and quality, enabling a more accurate measurement of semantic distortion. More importantly, the ensemble of GCN models can capture the correlation between different structures in a single image, thereby improving the model’s ability to examine image structures and enabling a comprehensive assessment of image quality.

Specifically, the mutual relationships between structures within the image are modeled through graphics. Nodes in the graph are represented using structural labels from the A4C images obtained through Echo. The edges between nodes are constructed based on a label co-occurrence matrix. The features of each node (label) in the graph are represented using word embeddings corresponding to the structural labels. Word embeddings are generated using GloVe³⁸ with a vector dimension of 300. GCNs are employed to propagate information across multiple labels. GCNs learn classifiers that capture the interdependencies of each image label. These classifiers are then fused with a shallow multi-label classifier to predict semantic structural labels. Finally, the structural classification results are transformed into quality scores, providing a comprehensive assessment of image quality based on semantic distortion.

Structure-aware learning: In this branch, we tackle a multi-label classification task. The loss function is formulated as follows :

ℒ sem = N - 1 \sum i = 0 C - 1 \sum c = 0 (y i c log (p i c) + (1 - y i c) log (1 - y i c)), <math display="block" altimg="eq-00031.gif"><msub><mrow><mi mathvariant="scprit">ℒ</mi></mrow><mrow><mstyle><mtext mathvariant="normal">sem</mtext></mstyle></mrow></msub><mo>=</mo><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>N</mi><mo>-</mo><mn>1</mn></mrow></munderover><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mi>c</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>C</mi><mo>-</mo><mn>1</mn></mrow></munderover><mo stretchy="false">(</mo><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi><mi>c</mi></mrow></msub><mo>log</mo><mo stretchy="false">(</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi><mi>c</mi></mrow></msub><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi><mi>c</mi></mrow></msub><mo stretchy="false">)</mo><mo>log</mo><mo stretchy="false">(</mo><mn>1</mn><mo>-</mo><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi><mi>c</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>,</mo></math> (7)

where N denotes the number of samples and C represents the number of structural categories.

$y_{i c}$ denote the true category of the ith image, which

$y_{i c}$ is assigned as 1 if the sample is annotated with category c and 0 otherwise. The

$p_{i c}$ represents the predicted category of the ith image.

The quality score for the semantic distortion measurement branch is defined as follows :

S sem = C \sum 1 (p i c * μ), <math display="block" altimg="eq-00035.gif"><msub><mrow><mi>S</mi></mrow><mrow><mstyle><mtext mathvariant="normal">sem</mtext></mstyle></mrow></msub><mo>=</mo><munderover accentunder="true" accent="true"><mrow><mo>\sum</mo></mrow><mrow><mn>1</mn></mrow><mrow><mi>C</mi></mrow></munderover><mo stretchy="false">(</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi><mi>c</mi></mrow></msub><mo>*</mo><mi>μ</mi><mo stretchy="false">)</mo><mo>,</mo></math> (8)

where

$μ$ is the mass fraction corresponding to different structures, taking a value of 0.5 when a structure is fully visible and 0 when the structure is entirely invisible.

4. Experiments

4.1. Dataset

Echo dataset: In this study, we utilized Echo data collected from 91 patients by cardiologists at the Dazhou Central Hospital in Sichuan Province, China. The data were acquired using a specially developed video recording tool. The heart was imaged from at least seven standard views, including parasternal long and short axes, apical two-, three-, and four-chamber, subcostal, and suprasternal. This paper focuses on the A4C view and collected 1170 images of this view. To align our research with clinical practice, we have enlisted the expertise of four professional ultrasound doctors to annotate each image meticulously. Their annotations include a quality score and information about the structural characteristics depicted in the images. Furthermore, an additional group of four experts carefully reviewed the results of these annotations. Each image was assigned nine categories of structural information (left and right atria, left and right ventricles, tricuspid and mitral valve, interventricular and interatrial septum, and left ventricular posterior wall). Each image receives a score between 0 and 4.5, with a higher score indicating better image quality and more apparent structural features. The data used in the research process are anonymized, de-identified, and stored securely using encryption methods.

4.2. Experiment settings

We implement our method on two NVIDIA GTX 1080Ti GPUs using PyTorch as the backend. The pre-processed size $256 \times 256$ images are fed into our proposed network. The Adam stochastic optimization algorithm³⁹ is employed, starting with an initial learning rate of $1 * 1 0^{- 3}$ and a decay factor of 0.1 applied every 10 epochs. We set the mini-batch size to 32 and the epoch number to 100. The default value of the parameter $η$ is set to 0.5. The parameter $γ$ is set to 0.5.

For the overall performance of the proposed method, we reported three commonly used evaluation metrics for performance comparison: Pearson Linear Correlation Coefficient (PLCC), Spearman Rank-Order Correlation Coefficient (SRCC), and ABE. PLCC quantifies the linear correlation between predicted outcomes and ground truths, computed in the following manner :

PLCC=ΣNi=1(si−ˉs)(pi−ˉp)√∑Ni=1(si−ˉs)2∑Ni=1(pi−ˉp)2,<math display="block" altimg="eq-00041.gif"><mstyle><mtext mathvariant="normal">PLCC</mtext></mstyle><mo>=</mo><mfrac><mrow><msubsup><mrow><mi mathvariant="normal">Σ</mi></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></mrow></msubsup><mo stretchy="false">(</mo><msub><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><mover accent="true"><mrow><mi>s</mi></mrow><mo>̄</mo></mover><mo stretchy="false">)</mo><mo stretchy="false">(</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><mover accent="true"><mrow><mi>p</mi></mrow><mo>̄</mo></mover><mo stretchy="false">)</mo></mrow><mrow><msqrt><mrow><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></mrow></msubsup><msup><mrow><mo stretchy="false">(</mo><msub><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><mover accent="true"><mrow><mi>s</mi></mrow><mo>̄</mo></mover><mo stretchy="false">)</mo></mrow><mrow><mn>2</mn></mrow></msup><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></mrow></msubsup><msup><mrow><mo stretchy="false">(</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><mover accent="true"><mrow><mi>p</mi></mrow><mo>̄</mo></mover><mo stretchy="false">)</mo></mrow><mrow><mn>2</mn></mrow></msup></mrow></msqrt></mrow></mfrac><mo>,</mo></math>(9)

where N denotes the number of images, the

$s_{i}$ represents the ground truth score, while

$p_{i}$ corresponds to the model’s predicted score for the ith image. The

$\bar{s}$ and

$\bar{p}$ denote the mean of the ground truth score and the mean of the predicted score, respectively. Slightly different from PLCC, SRCC measures a monotonic relationship between predicted results and ground truths

SRCC=1−6∑Ni=1(vi−ui)2N(N2−1),<math display="block" altimg="eq-00046.gif"><mstyle><mtext mathvariant="normal">SRCC</mtext></mstyle><mo>=</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mn>6</mn><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></mrow></msubsup><msup><mrow><mo stretchy="false">(</mo><msub><mrow><mi>v</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><msub><mrow><mi>u</mi></mrow><mrow><mi>i</mi></mrow></msub><mo stretchy="false">)</mo></mrow><mrow><mn>2</mn></mrow></msup></mrow><mrow><mi>N</mi><mo stretchy="false">(</mo><msup><mrow><mi>N</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></mfrac><mo>,</mo></math>(10)

where

$v_{i}$ and

$u_{i}$ represent the position of

$s_{i}$ and

$p_{i}$ in the ground truth score and predicted score, respectively. The ABE measures the error between predicted results and ground truth

ABE=1N∑i=1:n|pi−si|.<math display="block" altimg="eq-00051.gif"><mstyle><mtext mathvariant="normal">ABE</mtext></mstyle><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>N</mi></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn><mo>:</mo><mi>n</mi></mrow></munder><mi>|</mi><msub><mrow><mi>p</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><msub><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><mi>|</mi><mo>.</mo></math>(11)

In addition, for multi-label classification tasks, following conventional settings,^40,41,42 we report the average metrics for overall precision (OP), recall (OR), and

$F 1$ (OF1) as part of the performance evaluation for the image structure label classification branch. The labels are predicted for each image as positive if their confidence is more significant than 0.5.

4.3. Ablation experiment

Within this section, we conduct ablative studies to meticulously examine the impact of crucial components in our proposed model.

Analysis of the pixel distortion measurement: To evaluate the performance of our method’s pixel distortion measurement branch, we conducted comparative experiments between the proposed method and two classic regression models, namely, the Lasso regression model.³⁵ and Elastic Net³⁵ Similarly, we also compared it with two state-of-the-art methods: CNN-based ultrasound IQA (CNN_MIQA)¹³ and transformer-based medical image processing methods (MedViT)⁴³ to assess the effectiveness of our method. For each model, we calculated PLCC, SRCC, and ABE on the test set of the Echo dataset.

Table 1 shows that our method achieves the highest PLCC, SRCC, and lowest ABE. Compared to the two classic regression methods, LASSO and Elastic Net, our approach reduces the ABE by 0.274 and 0.257, respectively. Additionally, compared to the state-of-the-art CNN_MIQA method and the MedViT method, our approach reduces the ABE by 0.124 and 0.148, respectively. Our findings underscore the efficacy of our method in the domain of pixel distortion assessment. We further conducted two ablation studies to compare the effectiveness of using MSE and a-rank losses individually and in combination. The results indicate that the joint loss, which combines MSE and a-rank, achieves superior results compared to using either loss alone. This demonstrates that introducing an adaptive ranking mechanism in the quality regression task can enhance the model’s ability to predict quality. However, it should be noted that the predictive performance of using only a-rank loss is lower than that of using only MSE loss. This further validates the notion that combining both losses can yield better results. These findings strongly support the feasibility and accuracy of our proposed model.

Table 1. Performance of ultrasound image quality score regression methods on the Echo dataset. “MSE” denotes that we use MSE loss and “a-rank” denotes that we use the a-rank loss. “ $↑$ ” means larger value is better, “ $↓$ ” means smaller value is better. Bold numbers highlight the best results for each metric.
Methods	SRCC $↑$	PLCC $↑$	ABE $↓$
LASSO³⁰	0.403	0.452	0.663
EN³¹	0.393	0.384	0.655
CNN_MIQA¹³	0.432	0.563	0.522
MedViT⁴³	0.412	0.534	0.546
Ours ( $ℒ_{MSE}$ )	0.466	0.563	0.438
Ours ( $ℒ_{a-rank}$ )	0.404	0.506	0.445
Ours ( $ℒ_{MSE + a-rank}$ )	0.778	0.824	0.398

Analysis of the semantic distortion measurement: The essence of semantic distortion measurement is a multi-label classification task aimed at detecting different cardiac underlying structures in ultrasound images. Therefore, our method is compared with several widely used baseline methods, VGG-19,⁴⁴ ResNet50,³⁶ and Vit⁴³ commonly employed in computer vision tasks. Furthermore, we evaluate our method against two modern multi-label classification approaches: CNN–RNN⁴⁰ and Multi-Evidence.⁴¹ Furthermore, we analyzed the performance comparison between our proposed B₁ and B₂ architectures and the baseline methods. This analysis provides insights into the superiority of our architectures over the baselines. We also performed an ablation analysis to validate the effectiveness of introducing GCNs for capturing the correlation between heart structures and image quality in semantic distortion assessment. We excluded the GCN module from the semantic distortion measurement branch to observe its effect on performance.

According to Table 2, our proposed method outperforms existing methods in the OF1. Specifically, compared to two representative CNN backbones (VGG-19 and ResNet50) and transformer backbones (MedVit), the model shows an increase in average OF1 of 1.1%, 0.9% and 0.6%, respectively. Similarly, compared to state-of-the-art multi-label classification methods, the model shows an increase in average OF1 of 1.31% and 4.4%, respectively. Furthermore, adding one or two conv-blocks can help further improve the model’s overall performance. The optimal average OF1 achieved by the model is 93.9%. Additionally, we report the average precision (AP) for each category of our proposed framework and several other baselines, as shown in Fig. 3.

Fig. 3. The AP (in %) for each category is compared between our proposed framework and the several other baselines. “LV” and “RV” denote left and right ventricles, “LA” and “RA” denote left and right atria, “TV” and “MV” denote tricuspid and mitral valve, “IVS” and “IAS” denote interventricular and interatrial septum, and “LVPW” denotes left ventricular posterior wall.

Table 2. Performance comparison of the semantic distortion measurement branch with state-of-the-art methods on the Echo dataset. B₁ represents the feature extraction model with one additional convolutional block, while B₂ represents the model with two additional convolutional blocks. The reported metrics include average OP, OR, and OF1.
Methods	OP $↑$	OR $↑$	OF1 $↑$
VGG-19⁴⁴	0.935	0.922	0.928
ResNet50³⁶	0.932	0.928	0.930
MedViT⁴³	0.930	0.936	0.933
CNN–RNN⁴⁰	0.882	0.867	0.808
Multi-Evidence⁴¹	0.889	0.892	0.895
Ours (with B₁)	0.938	0.940	0.939
Ours (with B₂)	0.941	0.925	0.933
Ours ( $ℒ_{cls}$ and without GCN)	0.928	0.937	0.932
Ours ( $ℒ_{cls + GCN}$ )	0.938	0.940	0.939

Moreover, the quantitative analysis presented in Table 2 demonstrates that our method achieves a 0.7% improvement in the average OF1 compared to the model without the GCN module. This result highlights the significant contribution of the GCN module in enabling the model to learn the correlation between anatomical structures within the image. By leveraging this learned correlation, our model performs better in structure classification tasks.

Analysis of the multi-task framework: The ultimate quality score for the A4C image is established through a multi-task framework, encompassing the assessment of two facets: pixel distortion and semantic distortion. To analyze the individual contributions of these branches to the overall quality assessment performance, we conducted ablation studies on the Echo dataset. For clarity and convenience, we refer to the pixel distortion measurement branch as “PIX” and the semantic distortion measurement branch as “SEM”.

As shown in Table 3, combining both branches clearly leads to the best performance of the model. Notably, the experimental results reveal that introducing the semantic distortion branch enhances the overall quality assessment of A4C images compared to relying solely on pixel distortion. Specifically, there is an improvement of 0.02 in SRCC, 0.111 in PLCC, and a reduction of 0.034 in ABE. These findings provide strong evidence that considering the semantic structure features of the image is beneficial in enhancing the overall performance of A4C IQA.

Table 3. The ablation analysis illustrates how the functionalities of our modules contribute to enhancing performance on the Echo dataset. The “PIX” and “SEM” denote pixel distortion measurement and semantic distortion measurement, respectively.
Methods	SRCC $↑$	PLCC $↑$	ABE $↓$
PIX	0.778	0.824	0.398
SEM	0.793	0.783	0.403
PIX+SEM	0.798	0.935	0.364

As shown in Table 4, we conducted a detailed analysis of the weight sensitivity between the two branches in the multi-task framework proposed in this paper. The results indicate that the model performs optimally when the $γ$ value is set to 0.5. This finding suggests that both branches exhibit high robustness and stability within the range of weight adjustments, indicating that the model is not highly sensitive to changes in weight.

**Table 4. The impact of the weight selection of parameter $γ$ on different branches.**
( $γ$ )	( $1 - γ$ )	SRCC $↑$	PLCC $↑$	ABE $↓$
0.1	0.9	0.787	0.802	0.455
0.2	0.8	0.789	0.852	0.425
0.3	0.7	0.792	0.893	0.416
0.4	0.6	0.799	0.906	0.396
0.5	0.5	0.798	0.935	0.364

4.4. Comparisons with state-of-the-art methods

Qualitative analysis: Figure 4 visually represents the qualitative analysis results, comparing our proposed model with three established state-of-the-art models for ultrasound IQA. Encouragingly, our method achieves the best results, especially for images of similar quality (second column), clearly outperforming other models. This result demonstrates the effectiveness of our introduction of adaptive perceptual learning in assessing the accurate prediction of the quality score of similar imagery images. In addition, from the third column, it can be seen that the prediction performance of the low-resolution image quality score is better than other models. This result highlights the importance of identifying image structures for predicting ultrasound image quality, which is consistent with the evaluation criteria used by clinicians. However, this aspect has received little attention in current research.

Fig. 4. (Color online) Quality assessment results of our proposed method on A4C images compared to state-of-the-art methods. The best results are highlighted in red, and the second-best results are highlighted in blue. The labels indicate the actual quality scores.

Quantitative analysis: Table 5 shows the quantitative analysis results of our proposed model and the existing state-of-the-art ultrasound IQA models. The outcomes indicate that our proposed model attained the highest SRCC and PLCC and the lowest ABE. These findings provide evidence of its effectiveness in assessing the quality of ultrasound images.

**Table 5. Comparisons with state-of-the-art methods on the Echo dataset.**
Methods	SRCC $↑$	PLCC $↑$	ABE $↓$
MF R-CNN¹⁴	0.770	0.867	0.386
MUIQA¹⁶	0.786	0.878	0.367
ARVBNet¹⁷	0.749	0.780	0.379
Ours	0.798	0.935	0.364

Table 6 presents the average inference time and the number of parameters for various models using the Echo dataset. The MUIQA model demonstrates a slight advantage over other methods in parameters and inference time. This efficiency can be attributed to its design as a single-task framework, which directly predicts quality scores. Our benchmark indicates that the proposed method has a marginally lower inference time and fewer parameters compared to two multi-task models, MF R-CNN and ARVBNet. Overall, despite the complexity level of the proposed model being comparable to several models, it surpasses them in quality assessment performance.

**Table 6. Comparison of the number of averaged inference time and parameters for various models in Echo data experiments.**
Methods	Backbone	Time (s)	Params $(M)$
MF R-CNN	ResNet101	0.647	54.09
MUIQA	ResNet18	0.451	30.85
ARVBNet	VGG	0.632	43.87
Ours	ResNet50	0.547	35.58

4.5. Generalization performance

To further investigate the scalability of our adaptive ranking learning in other ultrasound IQA tasks, we conducted further experiments on the HFUS dataset³ and the UltraSound dataset.⁴⁵ It is important to note that the HFUS and UltraSound datasets do not contain information on image structure. The parameter $γ$ in Eq. (2) is set to 1.

HFUS dataset: The HFUS dataset collects 17,425 high-frequency ultrasound images of facial skin taken from 44 patients. Three physicians reviewed each image and determined its quality based on the presence of artifacts, noise, or if it was taken while the ultrasound probe was not in contact with the patient’s skin. If all three physicians agreed that an image was of high quality, it has been labeled as such. Conversely, if all three physicians have decided that an image was of low quality, it has been labeled as such. Images that the physicians have not unanimously agreed upon have been labeled as “blurred”. To ensure a fair comparison, images marked as “high quality” have been assigned a score of 3, “blurred” images have been assigned a score of 2, and “low quality” images have been assigned a score of 1. Based on Ref. 3, we compare separately with the three methods that achieve the best results on the HFUS dataset, respectively.

The numerical outcomes presented in Table 7 demonstrate that our approach has achieved performance at the forefront compared to other studies. Specifically, the $F 1$ value is improved by 8.6% compared with the HFUS method. These results demonstrate that our adaptive ranking learning method is highly scalable and can be applied to other ultrasound quality assessment datasets. It is worth noting that the Recall of the HFUS method is slightly higher than that of our method. This is because the HFUS dataset contains significantly more images labeled as “blurry” than the other two categories, leading to some imbalance in the data. Our method achieves a higher $F 1$ score by slightly sacrificing Recall.

**Table 7. Performance comparison of different network architectures.**
Methods	Accuracy $↑$	Precision $↑$	Recall $↑$	$F 1$ $↑$
DenseNet⁴⁶	0.796	0.733	0.909	0.811
VGG16⁴⁷	0.804	0.738	0.919	0.818
HFUS³	0.828	0.762	0.936	0.840
Ours	0.906	0.921	0.921	0.926

UltraSound dataset: The UltraSound dataset is an ultrasound quality assessment dataset. According to the quality of the ultrasound images, the datasets are classified into four categories (normal, noisy, blurry, and distorted). Each type contains 650 ultrasound images, for a total of 2600 images in the dataset. For a fair comparison, we mark the quality scores of the four normal, noise, blur, and distortion categories as 3, 2, 1, and 0, respectively. We compare the quality assessment performance with two state-of-the-art methods on the UltraSound dataset: Quantitative Feature Extraction Machine (QFEM) and VGG-19.

The quantitative results in Table 8 show a clear improvement of the proposed method compared to other methods on the UltraSound dataset. It further demonstrates the scalability of the adaptive ranking learning method to other ultrasound quality assessment data.

**Table 8. Performance comparison of different network architectures.**
Methods	Accuracy $↑$	Precision $↑$	Recall $↑$	$F 1$ $↑$
QFEM³²	0.816	0.819	0.816	0.817
VGG-19⁴⁸	0.962	0.963	0.962	0.964
Ours	0.997	0.996	0.995	0.995

5. An Intelligent Ultrasonic Image Quality Evaluation System

This paper presents an intelligent cardiac ultrasound IQA system based on the proposed method to assist sonographers in real-time echocardiogram acquisition. The system’s overall architecture is shown in Fig. 5. The system utilizes a deep neural network quality assessment model to guide sonographers in obtaining views quickly for diagnosis. The system comprises a server and a client, with the server handling data management and model calculations. The client interface is illustrated in Fig. 6. Upon system activation, the sonographer can examine the patient using the ultrasound machine as usual without any additional steps. The system automatically captures and uploads the ultrasound image data to the server. Subsequently, the server employs the trained deep neural network model to assess image quality and returns the assessment results to the client. The client promptly displays the evaluation results through a quality meter, enabling the doctor to locate the standard view and save the highest quality image as necessary. The system comprises the following specific modules.

Fig. 5. Functional architecture of intelligent ultrasonic image quality evaluation system.

Quality meter: The quality meter is the core function. Sonographers can make decisions or optimize the quality based on the predicted quality calculated and fed back by the model in real time.

Auto-capture: The auto-capture feature triggers an auto-capture clip when the image quality is predicted to be diagnostic. It simulates how a sonographer knows when an image is good enough for a diagnosis and documents it.

Save best clip: This feature allows the sonographer to retrospectively record the highest quality clips obtained to date for selection by the sonographer when recording a diagnostic report.

System management: The system management mainly completes managing the system interface and logs.

6. Conclusion

This paper proposes a novel neural network model for automated quality assessment of the A4C view. Our proposed approach employs a multi-task learning framework to comprehensively evaluate the quality of cardiac ultrasound A4C images from two aspects: pixel-based and semantic distortion measurement. We use an adaptive ranking mechanism that learns to assign quality scores to similar images based on contrast sensitivity weighting factors to evaluate the pixel distortion. We also use a structure-aware learning model incorporating a GNN to capture the relationship between the heart structures and the image quality to assess the semantic distortion. Extensive experiments on real-world cardiac ultrasound data and benchmark datasets demonstrate the efficacy of the proposed approach, achieving competitive performance beyond the state-of-the-art. Moreover, we have designed and implemented an intelligent cardiac ultrasound image quality automatic assessment system based on our proposed method. The system is expected to be applied clinically in the future. Moving forward, we plan to extend it to other tasks in 3D engineering applications,^49,50 and to apply our model to IQA experiments in other domains, such as the LIVE IQA database.

ORCID

Gadeng Luosang https://orcid.org/0009-0009-1873-3812

Zhihua Wang https://orcid.org/0000-0003-0355-903X

Jian Liu https://orcid.org/0000-0001-5148-5069

Fanxin Zeng https://orcid.org/0000-0002-7337-4463

Zhang Yi https://orcid.org/0000-0002-5867-9322

Jianyong Wang https://orcid.org/0000-0003-1689-2384

References

1. R. Zhao, Z. Xie, Y. Zhuang and P. L. Yu, Automated quality evaluation of large-scale benchmark datasets for vision-language tasks, Int. J. Neural Syst. 34 (2024) 2450009. Link, Web of Science, Google Scholar
2. R. B. Labs, M. Zolgharni and J. P. Loo, Echocardiographic image quality assessment using deep neural networks, in Medical Image Understanding and Analysis: 25th Annual Conf. (MIUA 2021) (Springer, 2021), pp. 488–502. Crossref, Google Scholar
3. J. Czajkowska, J. Juszczyk, L. Piejko and M. Glenc-Ambroży, High-frequency ultrasound dataset for deep learning-based image quality assessment, Sensors 22(4) (2022) 1478. Crossref, Web of Science, Google Scholar
4. L. Xu, Y. Xiong, J. Guo, W. Tang, K. K. Wong and Z. Yi, An intelligent system for craniomaxillofacial defecting reconstruction, Int. J. Intell. Syst. 37(11) (2022) 9461–9479. Crossref, Web of Science, Google Scholar
5. H. Wang, H. Huang, J. Wang, M. Wei, Z. Yi, Z. Wang and H. Zhang, An intelligent system of pelvic lymph node detection, Int. J. Intell. Syst. 36(8) (2021) 4088–4116. Crossref, Web of Science, Google Scholar
6. N. G. Pandian et al., Recommendations for the use of echocardiography in the evaluation of rheumatic heart disease: A report from the American Society of Echocardiography, J. Am. Soc. Echocardiogr. 36(1) (2023) 3–28. Crossref, Medline, Web of Science, Google Scholar
7. L. Xu et al., DW-Net: A cascaded convolutional neural network for apical four-chamber view segmentation in fetal echocardiography, Comput. Med. Imaging Graph. 80 (2020) 101690. Crossref, Medline, Web of Science, Google Scholar
8. L. Xu, M. Liu, J. Zhang and Y. He, Convolutional-neural-network-based approach for segmentation of apical four-chamber view from fetal echocardiography, IEEE Access 8 (2020) 80437–80446. Crossref, Web of Science, Google Scholar
9. M. C. Fiorentino, F. P. Villani, M. Di Cosmo, E. Frontoni and S. Moccia, A review on deep-learning algorithms for fetal ultrasound-image analysis, Med. Image Anal. 23 (2022) 102629. Google Scholar
10. J. Wang, X. Liu, F. Wang, L. Zheng, F. Gao, H. Zhang, X. Zhang, W. Xie and B. Wang, Automated interpretation of congenital heart disease from multi-view echocardiograms, Med. Image Anal. 69 (2021) 101942. Crossref, Medline, Web of Science, Google Scholar
11. S. A. Golestaneh, S. Dadsetan and K. M. Kitani, No-reference image quality assessment via transformers, relative ranking, and self-consistency, in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision (IEEE, 2022), pp. 1220–1230. Crossref, Google Scholar
12. J. Wang, L. Zhang and Y. Zhang, Mixture 2D convolutions for 3D medical image segmentation, Int. J. Neural Syst. 33(1) (2023) 2250059. Link, Web of Science, Google Scholar
13. S. Zhang, Y. Wang, J. Jiang, J. Dong, W. Yi and W. Hou, CNN-based medical ultrasound image quality assessment, Complexity 2021 (2021) 1–9. Crossref, Web of Science, Google Scholar
14. Z. Lin, S. Li, D. Ni, Y. Liao, H. Wen, J. Du, S. Chen, T. Wang and B. Lei, Multi-task learning for quality assessment of fetal head ultrasound images, Med. Image Anal. 58 (2019) 101548. Crossref, Medline, Web of Science, Google Scholar
15. A. H. Abdi et al., Automatic quality assessment of echocardiograms using convolutional neural networks: Feasibility on the apical four-chamber view, IEEE Trans. Med. Imaging 36(6) (2017) 1221–1230. Crossref, Medline, Web of Science, Google Scholar
16. Q. Chen, X. Min, H. Duan, Y. Zhu and G. Zhai, Muiqa: Image quality assessment database and algorithm for medical ultrasound images, in 2021 IEEE Int. Conf. Image Processing (ICIP) (IEEE, 2021), pp. 2958–2962. Crossref, Google Scholar
17. J. Dong, S. Liu, Y. Liao, H. Wen, B. Lei, S. Li and T. Wang, A generic quality control framework for fetal ultrasound cardiac four-chamber planes, IEEE J. Biomed. Health Inf. 24(4) (2019) 931–942. Crossref, Medline, Web of Science, Google Scholar
18. B. Zhang, H. Liu, H. Luo and K. Li, Automatic quality assessment for 2D fetal sonographic standard plane based on multitask learning, Medicine 100(4) (2021) e24427. Crossref, Medline, Web of Science, Google Scholar
19. F. C. Ghesu et al., Quantifying and leveraging predictive uncertainty for medical image assessment, Med. Image Anal. 68 (2021) 101855. Crossref, Medline, Web of Science, Google Scholar
20. K. Suriyan, N. Ramaingam, S. Rajagopal, J. Sakkarai, B. Asokan and M. Alagarsamy, Performance analysis of peak signal-to-noise ratio and multipath source routing using different denoising method, Bull. Electr. Eng. Inf. 11(1) (2022) 286–292. Google Scholar
21. H. Naimi, Performance and quality assurance of medical image using hybrid thresholding wavelet transform with wiener filter, Aust. J. Electr. Electron. Eng. 19(3) (2022) 294–299. Crossref, Google Scholar
22. J. Erfurt, C. R. Helmrich, S. Bosse, H. Schwarz, D. Marpe and T. Wiegand, A study of the perceptually weighted peak signal-to-noise ratio (WPSNR) for image compression, in 2019 IEEE Int. Conf. Image Processing (ICIP) (IEEE, 2019), pp. 2339–2343. Crossref, Google Scholar
23. S. U. Saeed et al., Image quality assessment for machine learning tasks using meta-reinforcement learning, Med. Image Anal. 78 (2022) 102427. Crossref, Medline, Google Scholar
24. X. Wei, J. Li, M. Zhou and X. Wang, Contrastive distortion-level learning-based no-reference image-quality assessment, Int. J. Intell. Syst. 37(11) (2022) 8730–8746. Crossref, Web of Science, Google Scholar
25. L. Kang, P. Ye, Y. Li and D. Doermann, Convolutional neural networks for no-reference image quality assessment, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2014), pp. 1733–1740. Crossref, Google Scholar
26. S. Yang, Q. Jiang, W. Lin and Y. Wang, SGDNet: An end-to-end saliency-guided deep neural network for no-reference image quality assessment, in Proc. 27th ACM Int. Conf. Multimedia (ACM, 2019), pp. 1383–1391. Crossref, Google Scholar
27. H. Chen and H. Li, Quality evaluation of light field sub-aperture images based on deep learning, in ICETIS 2022; 7th Int. Conf. Electronic Technology and Information Science (VDE, 2022), pp. 1–4. Google Scholar
28. M. Cheon, S.-J. Yoon, B. Kang and J. Lee, Perceptual image quality assessment with transformers, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, 2021), pp. 433–442. Crossref, Google Scholar
29. H. S. Nogay and H. Adeli, Machine learning (ML) for the diagnosis of autism spectrum disorder (ASD) using brain imaging, Rev. Neurosci. 31(8) (2020) 825–841. Crossref, Web of Science, Google Scholar
30. J. Ranstam and J. Cook, Lasso regression, J. Br. Surg. 105(10) (2018) 1348. Crossref, Web of Science, Google Scholar
31. C. Hans, Elastic net regression modeling with the orthant normal prior, J. Am. Stat. Assoc. 106(496) (2011) 1383–1393. Crossref, Web of Science, Google Scholar
32. M. M. Hossain, M. M. Hasan, M. A. Rahim, M. M. Rahman, M. A. Yousuf, S. Al-Ashhab, H. F. Akhdar, S. A. Alyami, A. Azad and M. A. Moni, Particle swarm optimized fuzzy CNN with quantitative feature fusion for ultrasound image quality identification, IEEE J. Transl. Eng. Health Med. 10 (2022) 1–12. Crossref, Google Scholar
33. J. Huh, S. Khan, S. Choi, D. Shin, J. E. Lee, E. S. Lee and J. C. Ye, Tunable image quality control of 3-D ultrasound using switchable CycleGAN, Med. Image Anal. 83 (2023) 102651. Crossref, Medline, Web of Science, Google Scholar
34. S. U. Saeed, Y. Fu, Z. M. Baum, Q. Yang, M. Rusu, R. E. Fan, G. A. Sonn, D. C. Barratt and Y. Hu, Learning image quality assessment by reinforcing task amenable data selection, in Information Processing in Medical Imaging: 27th Int. Conf. (IPMI 2021) (Springer, 2021), pp. 755–766. Crossref, Google Scholar
35. L. Liu, Q. Dou, H. Chen, J. Qin and P.-A. Heng, Multi-task deep model with margin ranking loss for lung nodule analysis, IEEE Trans. Med. Imaging 39(3) (2019) 718–728. Crossref, Medline, Google Scholar
36. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778. Crossref, Google Scholar
37. H. S. Nogay and H. Adeli, Diagnostic of autism spectrum disorder based on structural brain MRI images using, grid search optimization, and convolutional neural networks, Biomed. Signal Proc. Control 79 (2023) 104234. Crossref, Web of Science, Google Scholar
38. J. Pennington, R. Socher and C. D. Manning, GloVe: Global vectors for word representation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, 2014), pp. 1532–1543. Crossref, Google Scholar
39. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, preprint (2014), arXiv:1412. 6980 [cs.LG]. Google Scholar
40. Z.-M. Chen, X.-S. Wei, P. Wang and Y. Guo, Multi-label image recognition with graph convolutional networks, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, 2019), pp. 5177–5186. Crossref, Google Scholar
41. W. Ge, S. Yang and Y. Yu, Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2018), pp. 1277–1286. Crossref, Google Scholar
42. J. Lanchantin, T. Wang, V. Ordonez and Y. Qi, General multilabel image classification with transformers, in Proc. IEEE-CVF Conf. Computer Vision and Pattern Recognition (IEEE, 2021), pp. 16478–16488. Google Scholar
43. O. N. Manzari, H. Ahmadabadi, H. Kashiani, S. B. Shokouhi and A. Ayatollahi, MedViT: A robust vision transformer for generalized medical image classification, Comput. Biol. Med. 157 (2023) 106791. Crossref, Medline, Web of Science, Google Scholar
44. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint (2014), arXiv:1409.1556 [cs.CV]. Google Scholar
45. MahmodulHasan, UltraSoundImage dataset, https://github.com/MahmodulHasan/UltraSoundImage. Google Scholar
46. Z. Wang, Y. Song, B. Zhao, Z. Zhong, L. Yao, F. Lv, B. Li and Y. Hu, A soft-reference breast ultrasound image quality assessment method that considers the local lesion area, Bioengineering 10(8) (2023) 940. Crossref, Medline, Google Scholar
47. J. Lyu, S. H. Ling, S. Banerjee, J. Zheng, K.-L. Lai, D. Yang, Y.-P. Zheng, X. Bi, S. Su and U. Chamoli, Ultrasound volume projection image quality selection by ranking from convolutional RankNet, Comput. Med. Imaging Graph. 89 (2021) 101847. Crossref, Medline, Web of Science, Google Scholar
48. C. M. Vasile et al., Intelligent diagnosis of thyroid ultrasound imaging using an ensemble of deep learning methods, Medicina 57(4) (2021) 395. Crossref, Medline, Google Scholar
49. L. Li, F. He, R. Fan, B. Fan and X. Yan, 3D reconstruction based on hierarchical reinforcement learning with transferability, Integr. Comput.-Aided Eng. 30 (2023) 327–339. Crossref, Web of Science, Google Scholar
50. P. Li, F. He, B. Fan and Y. Song, TPNet: A novel mesh analysis method via topology preservation and perception enhancement, Comput.-Aided Geom. Des. 104 (2023) 102219. Crossref, Web of Science, Google Scholar

Vol. 34, No. 10

Metrics

Downloaded 579 times

History

Received 1 March 2024

Accepted 17 June 2024

Published: 10 July 2024

Information

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC BY) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Keywords

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Automated Quality Assessment of Medical Images in Echocardiography Using Neural Networks with Adaptive Ranking and Structure-Aware Learning

Abstract

1. Introduction

2. Related Work