World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Label-free in-vivo classification and tracking of red blood cells and platelets using Dynamic-YOLOv4 network

    https://doi.org/10.1142/S1793545824500093Cited by:2 (Source: Crossref)
    This article is part of the issue:

    Abstract

    In-vivo flow cytometry is a noninvasive real-time diagnostic technique that facilitates continuous monitoring of cells without perturbing their natural biological environment, which renders it a valuable tool for both scientific research and clinical applications. However, the conventional approach for improving classification accuracy often involves labeling cells with fluorescence, which can lead to potential phototoxicity. This study proposes a label-free in-vivo flow cytometry technique, called dynamic YOLOv4 (D-YOLOv4), which improves classification accuracy by integrating absorption intensity fluctuation modulation (AIFM) into YOLOv4 to demodulate the temporal features of moving red blood cells (RBCs) and platelets. Using zebrafish as an experimental model, the D-YOLOv4 method achieved average precisions (APs) of 0.90 for RBCs and 0.64 for thrombocytes (similar to platelets in mammals), resulting in an overall AP of 0.77. These scores notably surpass those attained by alternative network models, thereby demonstrating that the combination of physical models with neural networks provides an innovative approach toward developing label-free in-vivo flow cytometry, which holds promise for diverse in-vivo cell classification applications.

    1. Introduction

    Blood cell counting and classification play integral roles in biomedical practice and are essential for diagnosing and treating various conditions, including anemia,1 viral diseases,1,2 bleeding risk assessment,3 inflammation,4,5 and thrombotic disease.6,7 Conventional methods involve clinical pathologists manually classifying and counting different blood cell types on peripheral blood smear slides. However, this manual approach is time-consuming, exhibits poor repeatability, and its accuracy heavily relies on the subjective judgment and experience of analysts. Thus, an automated method has the potential to significantly enhance the efficiency and accuracy of blood cell classification and counting.

    The primary technologies for automatic blood cell classification and counting in blood smear images are image processing8 and deep learning (DL)-based methods.9 Previously, image processing techniques, such as the watershed10 and K-medoids11 algorithms, were utilized for identifying and counting blood cells from blood smear images.12 DL represents a pioneering approach renowned for its exceptional performance in feature extraction and blood cell classification. Various DL approaches, including convolutional neural networks,13,14,15 artificial neural networks,16 unsupervised domain adaptation,17 regional convolutional neural networks (R-CNNs),18,19 and You Only Look Once (YOLO),18,20,21 have been employed for blood cell classification and counting. Alam et al. proposed a blood cell classification method that automatically identifies and counts white blood cells (WBCs), red blood cells (RBCs), and platelets from blood smear images using YOLO as the classifier.21 Jiang et al. introduced Attention-YOLO, demonstrating higher performance than the standard YOLO approach in identifying and counting WBCs, RBCs, and platelets. Based on these methods, several automated hematology analyzers were developed and clinically implemented, such as the LUNA-IITM Automated Cell Counter and TC20TM Automated Cell Counter.22,23 While these methods exhibit excellent blood cell classification and counting capabilities, they currently require invasive blood collection, pose infection risks,24 and lack the ability for continuous monitoring of blood circulation within the native biological environment.25

    The advancement in in-vivo blood cell detection26,27,28 and counting technology has opened avenues for blood cell classification and continuous circulation monitoring.29,30,31,32,33,34 Unlike in-vitro blood cell classification methods, in-vivo detection circumvents processing steps such as blood collection and centrifugation. Nevertheless, the low contrast between blood cells and the background remains a significant challenge. Fluorescent labeling is often used to augment this contrast, which improves detection sensitivity and specificity, thereby augmenting classification accuracy. In 2021, Thapa et al. employed the YOLO algorithm to automatically identify and count fluorescently labeled neutrophils in zebrafish.35

    Currently, label-free in-vivo blood cell classification and counting pose ongoing challenges and attract great attention. Various in-vivo flow cytometry methods, combining techniques such as Raman spectroscopy, photoacoustic-photothermal, and phasic methods, have been developed and implemented to monitor and identify individual cells in the bloodstream circulation or lymphatic circulation.25,36 These methods identify cells using intrinsic signals. In this study, we propose D-YOLOv4, a novel model for classifying, counting, and tracking RBCs and platelets based on a simple bright-field imaging system. In the proposed approach, the AIFM module extracts temporal features of dynamic blood cells, whereas YOLOv4 captures spatial features. Compared with traditional 3D neural network models, D-YOLOv4 can analyze features from 3D spatial-temporal data while notably reducing computational effort. Experimental results validate the effectiveness of D-YOLOv4, showcasing its superiority over other advanced models, such as the standard YOLO and Faster R-CNN, for in-vivo blood cell detection. Furthermore, through accurate blood cell classification and motion tracking, counting, and velocity monitoring of blood cells are achieved. These findings suggest that the proposed approach is well-suited for long-term in-vivo blood cell classification and functional parameter monitoring tasks. These findings indicate that the proposed method is highly suitable for several hours of long-term in-vivo blood cell classification and functional parameter monitoring tasks.

    2. Materials and Methods

    2.1. Materials

    The materials used in the phantom experiment were 10μm polystyrene microspheres (PS10, Baseline ChromTech Research Center, Tianjin, China) and 4μm Green fluorescent polystyrene microspheres (GF-PS4, with λex=488nm, λem=518nm, Tianjin Big Goose Scientific Co. Ltd., Tianjin, China). The microspheres were injected into a complex solution by an injection pump (SP-2000, Ningbo Annol Medical Device Technology Co., Ltd., Ningbo, China) to simulate the blood. Biological experiment was conducted utilizing embryos from Tg(−6.0itga2b:EGFP) zebrafish, wherein thrombocytes were labeled by EGFP (obtained from the China Zebrafish Resource Center). The experiment strictly adhered to the laboratory animal protocol sanctioned by the Institutional Animal Care and Use Committee of Foshan University.

    2.2. Experimental setup

    The experimental setup includes a bright-field micro-imaging system and a fluorescence imaging system, as shown in Fig. 1(a). Figures 1(b) and 1(c) present the microscopic and fluorescent images of zebrafish with fluorescently labeled thrombocytes, respectively. To capture bright-field images, the halogen lamp (HL; voltage: 12V; power: 100W) emitted illumination that passed through the built-in Kohler illuminator and a band-pass filter (F1, FBH560-10, 560nm ± 5nm, Thorlabs Inc., USA). This process generated uniformly illuminated low-coherence light, transmitted onto the sample located on the stage (S). The forward-scattered light from the sample was collected using a CMOS camera (CMOS2, DP74, Olympus Inc., Japan) after passing through a 10:90 beam splitter (BS; Olympus Inc., Japan) and another band-pass filter (F4, FBH560-10, 560nm ± 5nm, Thorlabs Inc., USA). For fluorescence imaging, the excitation light from the mercury lamp (ML, power: 100W) passed through an excitation bandpass filter (F2, BP460-480, Olympus Inc., Japan) and a dichroic mirror (DM, DM490GFP, Olympus Inc., Japan). Subsequently, it converged through the objective to illuminate the sample. The excitation fluorescence was transmitted through the objective and an emission filter (F3, BA495-540, Olympus Inc., Japan), and then captured by the sCMOS camera (CMOS1, Neo 5.5 sCMOS, Andor Inc., UK). The start acquisition time and acquisition cycle of both cameras were synchronized, maintaining a sampling rate of 130fps. The fluorescence image serves as the ground truth for label creation and accuracy assessment in blood cell classification. The spatial resolution and temporal resolution of our method are 1.38μm and 8ms, respectively.

    Fig. 1.

    Fig. 1. (a) Schematic of the experimental setup. (b) Bright-field and (c) fluorescence images of the sample. HL: halogen lamp, F: filter, S: stage, O: objective, DM: dichroic mirror, BS: beam splitter, ML: mercury lamp, and CMOS: CMOS camera.

    2.3. Blood cell classification based on D-YOLOv4

    The proposed network model, D-YOLOv4, presents enhancements built upon the YOLOv437 framework. Figure 2 illustrates the structure of the D-YOLOv4 network, comprising the input, backbone, neck, and prediction components. In particular, the backbone encompasses an AIFM module designed to extract temporal features from dynamic cells.

    Fig. 2.

    Fig. 2. D-YOLOv4 network structure.

    The raw images captured by the CMOS2 camera were cropped to a size of 416×416 pixels, serving as the input data. Within the backbone, AIFM highlights temporal features of dynamic cells, whereas SPDarknet53 extracts features based on Darknet-53, augmented by a cross-stage feature fusion strategy. This modification reduces computational load while upholding the accuracy. Under low-coherence light with a center wavelength of 560nm, the absorption coefficient of the RBCs and platelets surpassed that of the background tissue.38,39,40,41 This disparity in absorption generates dynamic modulation signals in the absorption light intensity due to the blood cell movement. Within the AIFM module, frequency analysis is applied to demodulate the blood cell signal from the raw absorption signal. High- and low-pass filters in the frequency domain enable the separation of the dynamic blood flow signal (IAC) and stationary background signal (IDC) from the raw temporal signal, as expressed by the following formulas:

    IAC(x,y,t)=HPF[I(x,y,f)],(1)
    IDC(x,y,t)=LPF[I(x,y,f)],(2)
    where HPF[ ] and LPF[ ] denote the high- and low-pass filters, respectively. To emphasize the blood cell motion, the imaging parameter used is the instantaneous modulation depth (IMD), defined as the ratio of the instantaneous signal intensity of the blood flow to that of the background :
    IMD(x,y,t)=IAC(x,y,t)IDC(x,y,t).(3)
    Physically, the IMD indicates the instantaneous blood cell density in the raw images, offering insight into blood cell motion and facilitating the extraction of dynamic information about blood microcirculation. The network proceeds to extract spatial features based on IMD images. In practical scenarios, background tissues or deposited pigments can affect cell classification by the YOLO network. To overcome this limitation, the conventional YOLO model is enhanced by incorporating the optical absorption modulation effect, accurately representing the dynamic signal demodulation process in cells. The primary advantage of the resulting optical-enhanced YOLO network lies in its capability to demodulate the cell’s dynamic signal through IMD calculation, thus capturing the temporal features. This process effectively suppresses background tissue signals, thereby enhancing the signal-to-noise ratio of the cell signal.

    The neck section, comprising spatial pyramid pooling and a path aggregation network, integrates the extracted feature maps at different scales. Ultimately, the feature maps at three scales are outputted to predict bounding boxes, categories, and confidence levels. This process retains more precise prediction boxes by adhering to the confidence threshold.

    The performance of the D-YOLOv4 algorithm improved through training on a large dataset. To increase the variability of the network inputs and avoid overfitting during the training process, we implemented techniques such as 90 rotations, average blurring, and the addition of Gaussian noise. After data augmentation, which quadrupled the available training images, these were labeled by multiple experts using LabelImg (https://github.com/HumanSignal/labelImg), according to the fluorescence images and characteristics of the target. The resulting annotated files formed the basis for subsequent training and prediction.

    The input image size was set to 416×416 pixels. The experimental environment comprised a Windows 10 system, Intel Core i7-8700 (3.20GHz) CPU, NVIDIA GeForce GTX 1660 Ti (6GB) GPU, PyTorch version 1.6, and CUDA version 10.1. The general training settings included a batch size of 6, 100 training epochs, a learning rate of 1×103 for the first 50 epochs and 1×104 for the subsequent 50 epochs, and an intersection over union threshold of 0.5.

    2.4. Path tracking and instantaneous velocity measurement

    The blood cell paths are tracked by determining the cost derived after their identification and classification through the D-YOLOv4 network. Matching the positions of the same cell in the two consecutive images is vital for effective path tracking. The cost is defined as follows :

    cost=αΔL,(4)
    where α is the directional parameter used to exclude impractical cell tracking assignments and ΔL is the Euclidean distance between detections from preceding frames and the current frame. If any unfeasible motion is detected between detections, the distance cost is replaced with an infinite cost, preventing assignment between these detections. Unassigned detections within the current frame are allocated to a new trace, while traces unassigned in a previous frame are flagged as penalized. Once a track is penalized beyond a predetermined threshold, the cells within that track are deemed to have exited the field of view. Minimizing the cost enables tracking of the motion path of detected objects in each image frame.

    The instantaneous velocity of an individual blood cell is determined by computing the ratio of the motion distance to the corresponding transition time between successive images. The instantaneous velocity of blood cell Ci at time t can be expressed as follows :

    vti=Ltiτti.(5)

    2.5. Evaluation indices

    To assess the performance of blood cell classification, precision and recall, were obtained as follows:

    Precision=TPTP+FP,(6)
    Recall=TPTP+FN,(7)
    where the true positives (TP) are the correctly detected blood cells, the false positives (FP) are the incorrectly detected blood cells, and the false negatives (FN) are misdirected blood cells. The average precision (AP) value of each class corresponds to the area under the precision–recall curve, and the mean average precision (mAP) represents the mean of the APs for all categories; they are expressed as follows:
    AP=01Precision(recall)d(recall),(8)
    mAP=1Ni=1NAPi.(9)
    The average F1 score was defined as follows :
    F̄=i=1nF1(i)n,(10)
    where n is the number of blood cell classes, F1(i) is the score of the ith class, and F1=2PrecisionRecallPrecision+Recall represents the harmonic mean between the precision and recall values, ranging from 0 to 1.

    3. Results

    3.1. Physical model-based effective validation of simple 3D networks

    To ascertain the viability of D-YOLOv4 for classification and functional parameter monitoring, a phantom experiment was conducted using polystyrene microspheres of varying sizes. Within a refractive index-matching pool containing 0.9% brine, a 0.3 mm-inner-diameter capillary tube was securely fixed. Subsequently, 10μm polystyrene microspheres (PS10) and 4μm Green fluorescent polystyrene microspheres (GF-PS4) were introduced into the capillary tube using a syringe pump at a velocity of 0.1mL/h. PS10 and GF-PS4 were used to simulate the RBCs and platelets, respectively. A composite solution comprising 250μL of PS10 (25mg/mL), 25μL of GF-PS4 (10mg/mL), and 25mL of distilled water was used to simulate blood. The background tissue was simulated by using 0.9% saline. Concurrently, static sediments were intentionally included within the field of view to serve as a form of background noise.

    The images were resized to 416×416 pixels, increasing the initial count of 720 images to 2880 after augmentation. Post screening to eliminate particle-free images, a total of 2149 images were chosen for training purposes. Additionally, 8727 raw images, not part of the training set, were employed for testing. These were split into two groups: 4727 and 4000 images were utilized for model performance comparison and functional parameter monitoring, respectively. Identification of microspheres posed challenges due to static sediments and low contrast in the raw images (Figs. 3(a) and 3(e)). Consequently, validation relied upon the fluorescence images displayed in Figs. 3(b), 3(f), and Video S1. The D-YOLOv4 successfully classified PS10 and GF-PS4, excluding static sediments as depicted in Figs. 3(c), 3(g), and Video S2. However, the YOLOv4 model missed identifying GF-PS4 (red arrow) and erroneously identified static sediment (blue arrow) (refer to Figs. 3(d), 3(h), and Video S3). Subsequently, in Fig. 4, we present the classification outcomes of PS10 and GF-PS4 obtained using D-YOLOv4 and YOLOv4. The results indicate that the detection performance of D-YOLOv4 surpasses that of the standard YOLOv4 method concerning recall, AP, and mAP, regardless of their use in detecting PS10 or GF-PS4.

    Fig. 3.

    Fig. 3. Performance comparison between YOLOv4 and D-YOLOv4: (a) and (e) are the raw images; (b) and (f) are the corresponding fluorescence images (see Visualization 1 [Video S1]); (c) and (g) are the corresponding D-YOLOv4 detection results (see Visualization 2 [Video S2]); and (d) and (h) are the corresponding YOLOv4 detection results (see Visualization 3 [Video S3]).

    Fig. 4.

    Fig. 4. (a) Precision–recall curve of PS10, (b) precision–recall curve of GF-PS4, and (c) performance comparison between the YOLOv4 and D-YOLOv4 algorithms.

    To delve deeper into the rationale behind the enhanced performance, an analysis was conducted on the temporal and spatial signal curves of YOLOv4 and D-YOLOv4. Illustrated in Fig. 5(b), the raw spatial signal of point A in Fig. 3(e) exhibits a high absorption characteristic akin to that of the particle signal attributed to static sediments. This similarity constitutes the primary cause of false detection. However, the temporal IMD signal at point A consistently maintains itself at the background level. This phenomenon occurs because the AIFM module extracts dynamic temporal signals, emphasizing the movement of particles in relation to the background. By integrating the AIFM module, the D-YOLOv4 model extends the network from 2D to 3D, facilitating the extraction of dynamic particle features by amalgamating spatial and temporal information. This effectively resolves the challenge of identifying targets with a limited number of distinct features and similarity to noise spatial features, which previously hindered accurate recognition.

    Fig. 5.

    Fig. 5. (a) Raw temporal signals at the points marked as A and B in Fig. 3(e) and (b) spatial signal at the position indicated by the dotted line in Fig. 3(e).

    Upon categorization using the D-YOLOv4 model, the implementation of motion tracking and functional parameter monitoring becomes feasible. Video S4 shows the process of particle-tracking and Figs. 6(a)–6(c) exhibit particle-tracking obtained at 2.00, 32.00, and 65.00s, respectively. These illustrations demonstrate the simultaneous acquisition of motion paths, counting numbers, and instantaneous velocities of PS10 and PS4. The counting curves for PS4 and PS10 are depicted in Figs. 6(d) and 6(e), respectively. Throughout a span of 66.67s, a total of 23 PS10 and 28 PS4 particles traversed the detection area. The mean squared errors (MSEs)42 for the PS10 and PS4 counting curves were 0.009 and 0.016, correspondingly. As anticipated, smaller particles presented more difficulty in detection and tracking, leading to a 0.007 increase in the MSE. Nonetheless, the outcomes showcased a high counting accuracy. Notably, different particles injected at an identical flow velocity and passing through the same position exhibit variability. Illustrated in Fig. 6(f), the velocity of blood cells computed using the proposed method demonstrated a strong correlation (r2=0.986, n=1869) with manually analyzed data. This alignment underscores the precision of velocity measurements. Compared to average velocity measurements,43 instantaneous velocity measurements offer a more precise statistical depiction of individual particle kinetics, thereby aiding in the analysis of drug delivery effects or cellular motility.

    Fig. 6.

    Fig. 6. Particle-tracking and parameter-monitoring results: (a)–(c) Particle-tracking results at 2.00, 32.00, and 65.00s, respectively (see Visualization 4 [Video S4]), (d) counting curve of PS10, (e) counting curve of PS4, and (f) linear relationship between real and measured velocities ranging between 0 and 600μm/s.

    3.2. RBC and platelet classification and tracking

    Due to the intricate nature of living organisms’ environments, cell recognition and classification encounter considerable challenges. To further demonstrate the efficacy of this approach, a biological experiment was conducted utilizing embryos from zebrafish. Although platelets and thrombocytes come from different animal samples, they are functionally similar, and both are significantly smaller in size than RBCs. In practical studies, zebrafish models are usually used to study platelet bleeding disorder, thrombocytopenia, and von Willebrand’s disease. The thrombocytes in zebrafish are equivalent to platelets in mammals. The experiment strictly adhered to the laboratory animal protocol sanctioned by the Institutional Animal Care and Use Committee of Foshan University. Approximately 24h post-fertilization (hpf), 0.003% phenylthiourea (PTU) was introduced to impede pigmentation. The biological samples consisted of zebrafish embryos approximately 72h old. Prior to conducting in-vivo blood cell classification experiments, the zebrafish embryos were anesthetized using 0.64μg/mL tricaine to minimize movement. Subsequently, the zebrafish was placed in a water droplet on a glass slide to prevent mortality during the experiment. Importantly, the zebrafish remained alive following the completion of the experiment.

    Considering the limited available data, we employed an image augmentation method to expand the training sample size. This involved implementing 90 rotations, average blurring, and the addition of Gaussian noise. The final dataset comprised 1956 images, with 1760 in the training set and 196 in the validation set. Additionally, 1000 images without augmentation constituted the test set. Notably, these training and test sets were obtained from distinct zebrafish embryos, ensuring that test images were not part of the training dataset. To mitigate bias, individual zebrafish blood cells were labeled by multiple independent researchers using labeling software, relying on raw and fluorescent images for labeling RBCs and thrombocytes, respectively. In Figs. 7(b)–7(d), the performance of the proposed D-YOLOv4 was compared with advanced object detection models. Further detailed information regarding the prediction results of D-YOLOv4, Faster R-CNN,17 and YOLOv4 can be accessed at positions 5.27s in videos S6, S7, and S8, respectively. Figure 7(b) depicts an image processed by the AIFM module, revealing that the temporal feature extraction effectively highlights the blood red cell signal when compared to the raw image (Figs. 7(a), 7(c), or 7(d)). D-YOLOv4 exhibits superior classification results for blood cells, showcasing minimal false cases (indicated by green arrows) and missing cases (indicated by red arrows). The predicted outcomes by the proposed model closely aligned with the ground truth. Our research primarily focuses on single capillaries due to the anticipation that the earliest changes in microvascular perfusion occur at this level. Studies on the consequences of capillary function may be beneficial for understanding early disease information.44

    Fig. 7.

    Fig. 7. Performance comparison between various methods: (a) Ground truth (see Visualization 5 [Video S5]), (b) detection results using D-YOLOv4 (see Visualization 6 [Video S6]), (c) detection results using Faster R-CNN (see Visualization 7 [Video S7]), and (d) detection results using YOLOv4 (see Visualization 8 [Video S8]).

    The detection performance is quantitatively assessed using the AP, mAP, and average F1 score values, as defined in Eqs. (8)–(10). Table 1 demonstrates that D-YOLOv4 exhibits superior detection performance compared to the standard YOLOv4 and Faster R-CNN methods in terms of AP, mAP, and average F1 score, regardless of whether they are employed for RBC or thrombocyte detection. In the case of RBC detection, YOLOv4 and Faster R-CNN yield APs of 0.46 and 0.25, respectively, while D-YOLOv4 achieves a notable improvement with an AP of 0.90. When detecting thrombocytes, D-YOLOv4 achieves an AP of 0.64, while YOLOv4 and Faster R-CNN exhibit decreased APs of 0.06 and 0.02, respectively, with instances of missed detections. The quantitative mAP results support these findings, displaying an increase from 0.14 (Faster R-CNN) and 0.26 (YOLOv4) to 0.77 (D-YOLOv4). Moreover, the average F1 score demonstrates an increase from 0.18 (Faster R-CNN) and 0.35 (YOLOv4) to 0.77 (D-YOLOv4), signifying the superior detection performance achieved by D-YOLOv4. It is worth noting that the accuracy of thrombocyte identification is relatively lower compared to RBC identification. Identification errors fall into two categories: Thrombocytes incorrectly identified as RBCs and thrombocytes missed by the proposed method. Among these error events, incorrect identification as RBCs prevails, followed by thrombocyte omissions, albeit to a lesser degree. The reduction in accuracy can be attributed to the scarcity of thrombocytes in blood and their relatively weak signal.

    Table 1. Performance comparison between various methods.

    APRBCAPThrombocytesmAPF̄
    D-YOLOv40.900.640.770.77
    YOLOv40.460.060.260.35
    Faster R-CNN0.250.020.140.18

    To explore hemodynamic information regarding blood cells and enhance prediction accuracy, a custom multi-objective tracking algorithm described in Sec. 2.3 was implemented. This algorithm aimed to acquire instantaneous velocities and blood cell counts while addressing occasional prediction errors through multi-frame image correlation association, thereby improving the accuracy. Figure 8 displays the results of blood cell identification and tracking at different time points, with each identified blood cell’s motion path represented by lines of distinct colors. More detailed cell tracking information was available in Video S9. A substantial variation in instantaneous velocity was observed, which is a notable finding for future investigations concerning various responses among different blood cells to identical stimuli. Table 2 summarizes the accuracy of our approach in blood cell counting. Experimental results indicated the method’s effectiveness in identifying RBCs and thrombocytes, aligning with normal blood cell ratios. When D-YOLOv4 was employed for RBC counting, resulting in higher cell numbers, an approximate counting error of 6% was observed. However, the error rate for thrombocyte counts was 0%, owing to the multi-frame data association’s ability to mitigate the effects of leakage and misdetection. Given the relatively low percentage of thrombocytes in whole blood, multi-frame data association significantly improves the accuracy of their movement trajectory tracking. Nevertheless, a marginal error may persist within an acceptable range as the count increases. Accurate cell counting holds considerable importance in medical image analysis. In clinical settings, manual counting of various cell types entails substantial workload. Therefore, precise automated cell counting methods harbor significant potential across diverse applications.

    Fig. 8.

    Fig. 8. In-vivo blood cell tracking using microscopic images, including cell types, positions, and trajectories at (a) 0.70, (b) 0.80, (c) 7.50, and (d) 7.60 s (see Visualization 9 [Video S9]).

    Table 2. Counting accuracy of the proposed method.

    Ground truth numberCounting numberError rate
    RBCs2782956.1%
    Thrombocytes11110%

    4. Discussion

    This study demonstrates that D-YOLOv4 exhibits higher accuracy in in-vivo blood cell classification and detection compared to YOLOv4. In the realm of in-vivo detection, factors like pigment deposition and uneven tissue thickness can impede blood cell feature extraction. D-YOLOv4 is a low-computation three-dimensional feature extraction network comprising a temporal feature extraction module, AIFM, and a spatial feature extraction module, YOLOv4. Leveraging the physical phenomenon where moving blood cells induce absorption intensity fluctuations, the AIFM module captures the temporal characteristics of blood cells. Notably, the data computation involved in AIFM is considerably smaller than in traditional one-dimensional methods.

    To comprehend and analyze the acquired features of the D-YOLOv4 and YOLOv4 models, a Grad-CAM analysis was conducted on the last channel utilizing the test set images. This visualization technique helped in exploring the feature maps of the network models, enhancing our understanding of the DL model. Figure 9 displays the Grad-CAM outcomes obtained from zebrafish test set images. The activation map, ranging from blue (low) to red (high), signifies the model’s level of interest. A discernible distinction in focus is apparent in the blood cell classification between the two models. As depicted in Figs. 9(b) and 9(e), concerning RBC identification, the D-YOLOv4 model demonstrates a closer alignment between its identified regions of interest and the actual RBC regions. Conversely, the YOLOv4 model appears less sensitive to RBC areas, being influenced by the surrounding background. Furthermore, upon comparing the outputs of the YOLOv4 model, it is evident that the D-YOLOv4 model accurately pinpoints thrombocyte regions (as depicted in Figs. 9(c) and 9(f)). The Grad-CAM findings signify that the D-YOLOv4 model excels in in-vivo blood cell classification due to its ability to extract dynamic cell features through the AIFM module, emphasizing more on temporal features while minimizing the influence of static backgrounds on cell classification. This holds significance in living cell classification, particularly because the accuracy of 3D spatial networks can be susceptible to nonhomogeneous in-vivo samples, including factors like pigmentation and plaques.

    Fig. 9.

    Fig. 9. Grad-CAM visualizations: (a) Raw image, Grad-CAM visualization of (b) RBCs and (c) thrombocytes using the YOLOv4 model, (d) dynamic cell characterization output by the AIFM module, and Grad-CAM visualizations of (e) RBCs and (f) thrombocytes using the D-YOLOv4 model.

    Moreover, feature map visualization was utilized to directly observe the feature maps of different network model layers, shedding light on the feature extraction process. As illustrated in Fig. 10(a), the YOLOv4 model predominantly focuses on background areas or pigment depositions, allocating less attention to blood vessel and cell classification features. Conversely, as depicted in Fig. 10(b), subsequent to the processing by the AIFM module, the D-YOLOv4 model exhibits a heightened emphasis on blood vessels and cells, thereby enhancing the network model’s dynamic feature extraction capacity and enabling more precise classification. Consequently, the D-YOLOv4 network demonstrates proficiency in learning a wide array of blood cell spatial features, thereby supporting the premise that dynamic features and structural information may be advantageous for blood cell classification.

    Fig. 10.

    Fig. 10. Features extracted by (a) YOLOv4 and (b) D-YOLOv4.

    The proposed method showcases the viability of in-vivo blood cell classification, tracking, and velocity measurement, significantly aiding in the early diagnosis of certain blood diseases and contributing to biomedical research on blood microcirculation. While the proposed method elevates blood cell classification accuracy and measures various functional parameters, it is essential to acknowledge its limitations. This method primarily achieves blood cell classification by leveraging morphological differences among the blood cells. Consequently, individual differences among blood cells might introduce errors in the proposed method, Small numbers of other blood cells that are morphologically similar to red blood cells or platelets can affect the accuracy of classification. The method is limited to the classification and measurement of individual blood cells, presenting challenges in its application to classifying blood cell clusters within larger vessels. Furthermore, due to the limitation of the light penetration depth, this method is applicable only to thin or nearly transparent samples such as chicken embryos or zebrafish.45,46

    Subsequent research will focus on augmenting available blood cell information by integrating imaging techniques like polarization, hyperspectral, and phase imaging. This integration aims to enhance cell classification accuracy and expand the discernible range of cell types. Additionally, future studies are anticipated to simultaneously classify subclasses of white blood cells, integrating them into the realm of blood disease research, including leukemia, sickle cell anemia, and malaria studies.47,48,49 Moreover, the integration of a physical model into a neural network introduces innovative perspectives for in-vivo flow cytometry.

    5. Conclusion

    This study proposes a label-free in-vivo flow cytometry technique, called D-YOLOv4, for the classification, tracking, instantaneous velocity measurement, and counting of RBCs and platelets. This approach combines optical absorption modulation effects with DL networks to enhance the detection of moving blood cell temporal features through AIFM module and classifies morphological differences between RBCs and platelets via YOLOv4 networks. In future research, the integration of multiple physical measurement parameters into DL networks may address the challenge that low-complexity networks make it difficult to efficiently extract complex features of cells in organisms, thus potentially enhancing the applicability of label-free in-vivo flow cytometry.

    Acknowledgments

    This work was supported by the National Natural Science Foundation of China (62075042 and 62205060), and the Research Fund of Guangdong-Hong Kong-Macao Joint Laboratory for Intelligent Micro-Nano Optoelectronic Technology (2020B1212030010). The Special Fund for Research on National Major Research Instruments of China (Grant No. 62027824). Special Fund for Science and Technology Innovation Cultivation of Guangdong University Students (No. pdjh2022b0543). Caizhong and Bin He contributed equally to this work.

    Conflict of Interest

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    ORCID

    Caizhong Guan  https://orcid.org/0000-0003-1190-0576

    Bin He  https://orcid.org/0009-0006-4997-8319

    Hongting Zhang  https://orcid.org/0000-0002-5423-9055

    Shangpan Yang  https://orcid.org/0009-0003-5288-2236

    Yang Xu  https://orcid.org/0009-0006-6165-5985

    Honglian Xiong  https://orcid.org/0000-0001-8543-9781

    Yaguang Zeng  https://orcid.org/0000-0002-4815-1890

    Mingyi Wang  https://orcid.org/0000-0002-0784-296X

    Xunbin Wei  https://orcid.org/0000-0002-4269-9943