World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Enhancing the data processing speed of a deep-learning-based three-dimensional single molecule localization algorithm (FD-DeepLoc) with a combination of feature compression and pipeline programming

    https://doi.org/10.1142/S1793545824500251Cited by:0 (Source: Crossref)

    Abstract

    Three-dimensional (3D) single molecule localization microscopy (SMLM) plays an important role in biomedical applications, but its data processing is very complicated. Deep learning is a potential tool to solve this problem. As the state of art 3D super-resolution localization algorithm based on deep learning, FD-DeepLoc algorithm reported recently still has a gap with the expected goal of online image processing, even though it has greatly improved the data processing throughput. In this paper, a new algorithm Lite-FD-DeepLoc is developed on the basis of FD-DeepLoc algorithm to meet the online image processing requirements of 3D SMLM. This new algorithm uses the feature compression method to reduce the parameters of the model, and combines it with pipeline programming to accelerate the inference process of the deep learning model. The simulated data processing results show that the image processing speed of Lite-FD-DeepLoc is about twice as fast as that of FD-DeepLoc with a slight decrease in localization accuracy, which can realize real-time processing of 256×256256×256 pixels size images. The results of biological experimental data processing imply that Lite-FD-DeepLoc can successfully analyze the data based on astigmatism and saddle point engineering, and the global resolution of the reconstructed image is equivalent to or even better than FD-DeepLoc algorithm.

    1. Introduction

    With its ultra-high resolution, single molecule localization microscopy (SMLM) has been widely used in biology and other research fields1,2 providing technical means for research such as living cell imaging,3,4 organelle interaction5,6 and disease pathogenesis.7 In view of the principle that SMLM improves the resolution by multiple sampling and localization of single molecule fluorescence signals, it can fundamentally avoid the axial overlap of fluorescence signals and realize three-dimensional (3D) imaging.8 On the other hand, all biological samples have 3D structures.9 Therefore, three-dimensional super-resolution localization microscopy (3D-SMLM) has been applied to neuroscience, cytology and other fields by researchers, and is constantly providing assistance for related research.10

    The core technology of SMLM is to accurately estimate the spatial position of single molecule from the fluorescence image, and then reconstruct the super-resolution image (SRI) according to the spatial position coordinates of single molecule.11 Therefore, the reliability of SRI is closely related to the performance of localization algorithm.12 3D-SMLM firstly encodes the 3D spatial information of the single molecule by the point spread function (PSF) engineering, and then decodes the information according to the image to obtain the accurate 3D position information of the single molecule.13,14 At present, there are two kinds of decoding algorithms for 3D information of single molecule fluorescence signal: traditional algorithm based on fitting and the decoding algorithm based on deep learning.15 Among them, the traditional algorithm models the relationship between a single molecule’s PSF and its axial position, and then iteratively calculates the position and intensity information of other molecules according to image.16,17 However, this decoding process needs to optimize the parameters related to the sample and the system, which would be extra time-consuming.18 In recent years, with the rapid development of artificial intelligence technology, researchers have begun to apply the deep learning method to the 3D-SMLM, and successively developed 3D super-resolution localization algorithms such as DECODE19 and FD-DeepLoc,20 etc.21 Among them, DECODE shows excellent Jaccard index and localization error in the open competition.19 The FD-DeepLoc can accurately decode the PSF that varies with space, and avoid the restriction of uneven aberration distribution on the field of view of 3D SMLM, which significantly improves the imaging throughput.20

    In a word, the 3D super-resolution localization algorithm based on deep learning has gradually become a research hotspot in the field of 3D-SMLM after continuous development, and has broad application prospects. However, limited by the speed of data processing, even the most advanced FD-DeepLoc algorithm cannot fully meet the needs of online image processing.20 This requirement is of great significance in some scenes that need to observe the sample state in time, such as single particle tracking.22 Moreover, because the data cannot be processed in real time, the experimenter needs to collect more redundant data to ensure the validity of the data, which will produce a large amount of data accumulation and greatly reduce the efficiency of scientific research. Therefore, improving the processing speed of localization algorithm is the difficult and hot spot of 3D-SMLM.

    Under the demand of online data processing, this paper first analyzes the most advanced algorithm FD-DeepLoc in detail, and finds that there may be two problems that restrict its processing speed: the huge number of model parameters caused by the complex network architecture and a quantity of repeated operation. Among them, in view of the huge amount of model parameters, some researchers have improved the processing speed by feature compression. For example, the DeepCAD-RT improves the processing speed by pruning the redundant features in the network structure of the DeepCAD.23 On the other hand, pipeline programming is a good way to improve the speed of data processing in view of the repeated operation of many processes. For example, Gui used pipeline programming on the FPGA-GPU heterogeneous computing platform to accelerate the data processing, which improved the running speed of the single molecule localization algorithm.24

    Based on the above background, this paper develops a faster localization algorithm Lite-FD-DeepLoc on the basis of FD-DeepLoc to meet the needs of online localization. Specifically, this paper uses feature compression technology to compress the network parameters of FD-DeepLoc, and then improves the model capacity through parameter tuning training, so as to reduce the computational complexity of the model while ensuring the model performance. Secondly, the pipeline programming is used in the prediction process to further speed up the running speed of the model. The simulated data processing results show that Lite-FD-DeepLoc takes 10.4ms to process a frame of 256×256256×256 pixels image, which is about twice as fast as FD-DeepLoc, and could meet the requirements of real-time data processing. The results of further biological experiment data processing imply that Lite-FD-DeepLoc can successfully analyze the data based on astigmatism and saddle point engineering, and the global resolution of reconstructed images can reach 33nm (FRC resolution over the entire aberrated field of view).

    2. Materials and Methods

    2.1. Generation of simulated data and experimental data

    The simulation data in this paper are generated by the PSF simulator in FD-DeepLoc. In detail, the positions of fluorescent molecules and the number of fluorescent photons were set to be randomly distributed, and their aberration coefficients were adjusted according to their lateral positions. As the aberration level, we refer to the paper of FD-DeepLoc, and also add an extra normal distributed value with a mean of zero and a standard deviation of λ100λ100 to Zernike coefficient. In addition to the aberration of the imaging system, considering the problems caused by camera noise, the generation model also uses the noise model to add noise to the simulated image. In this paper, except for the test experiments with different signal-to-noise ratio (SNR) and different molecular densities, the SNRs of the simulated images used in other tests are all set to medium level, and the molecular density parameter is 1.5 emitters μμm22.

    The experimental data in this paper adopts the open experimental dataset in FD-DeepLoc. In the experiment of this paper, two kinds of 3D-SMLM data based on astigmatism PSF and saddle point PSF engineering were used. The biological structure used in imaging is Nup96-SNAP and its size have been reported, which can provide a reference for the data quantification in this paper.

    2.2. Feature compression

    By analyzing the architecture of FD-DeepLoc, it is found that its network architecture consists of three main parts: frame analysis module, temporal context module and output module, in which the frame analysis module and temporal context module are composed of four U-net networks. In the network, except for up-sampling and CoordConv operation, the rest are convolution calculations. The number of convolution-layer channels of FD-DeepLoc is 48, and the model parameters under this setting are 2,564,218, which will lead to the slow running speed of the model. Therefore, this paper tries to halve the number of convolution-layer channels of the model several times to reduce the model parameters. Considering that the reduction of model parameters will reduce the accuracy of model inference results, this paper also uses the method of hyperparameter optimization to improve the performance of the model after feature compression to suppress the decline of model’s inference accuracy.

    2.3. Pipeline programming

    Pipeline programming is a method to disassemble the repeated sequential process into several subprocesses to improve the efficiency of the system. This technology enables each independent subprocess to be effectively parallel to other subprocesses in its special functional segment. By analyzing the data stream in FD-DeepLoc, it is found that its data processing can be divided into two stages: network forward propagation and feature map conversion. Among them, the network forward propagation process has been accelerated by feature compression, as shown in Fig. 1(b), and the operation in the feature map conversion stage is consistent, as shown in Fig. 1(c). Therefore, for the repetitive operation at this stage, pipeline programming can be used to process the repeated operations in parallel to improve the running speed of the algorithm.

    Fig. 1.

    Fig. 1. Schematic diagram of feature compression and pipeline programming. (a), (b) The schema of network connection before and after feature compression; (c), (d) schematic diagram of inference process without and with pipeline programming.

    In practice, we use multi-thread parallel method to realize pipeline technology. First, we take one of the repeated computing tasks as the main thread, and then we open up sub-threads for other tasks. Among them, the data processing between different threads is carried out in parallel. Specifically, we preprocess the data first, and then start the localization calculation of the first thread. At the same time, as the data post-processing of this thread, we open another thread to start the localization calculation and data post-processing, as shown in Fig. 1(d). This process can be iterated until the maximum number N of threads allowed by computer hardware is turned on.

    2.4. Model training and its operating environment

    In the experiment, this paper uses the simulated data with 3D coordinate information labels to train the model. In training, when the model undergoes 500 iterative trainings, the performance of the model is tested with the pre-generated test dataset to check the training effect. In order to improve the generalization ability of the improved model, this paper trained the model for 30,000 times to test experimental data, and 60,000 times to test simulated data. Each training used different training datasets randomly generated by the imaging model. The training process is carried out on a desktop computer, which is configured with a CPU of Intel Core i7-11700K and a GPU of NVIDIA Geforce RTX 3060 GPU 12GB.

    In the training process, we tested the generalization performance of the model every 500 training to determine whether it met the requirements and repeated the training process until the loss function of the model was reduced to near 0. The model performance test was carried out using 30 images with 2048×20482048×2048 pixels (these images were not used in the model training stage). By changing the parameter of the model many times, the best performance that the model can achieve was obtained, and the parameter configuration with the test results is the final parameter setting. The hyperparameters of FD-DeepLoc mainly include the following: learning rate, learning rate decay, gradient norm clipping, weight decay, filter and batchsize. By constantly changing the parameter settings, we finally determined that when the learning rate is 0.0009 and the batch size is 12, the prediction results produced by the model are the best.

    2.5. Evaluation index of algorithm performance

    The performance evaluation of single molecule localization algorithm mainly includes two indexes: Jaccard index and localization error. The Jaccard index is an index to measure the ability of an algorithm to accurately locate the molecular position, which can be quantified by the Jaccard index and calculated by Eq. (1) :

    JI=TPTP+FP+FN.JI=TPTP+FP+FN.(1)
    In the above, TP represents the number of emitters detected by the algorithm that can match the ground-truth emitter, FP represents the number of emitters detected by the algorithm that cannot match the ground-truth emitter, and FN represents the number of emitters detected by the algorithm but not in the ground-truth emitter. The matching threshold of the lateral position is 250nm, and the matching threshold of the axial position is 500nm.

    The localization error is quantified by the root mean square error (RMSE), which indicates the error between the molecule positions detected by the algorithm and the ground-truth coordinates, which can be calculated by Eq. (2) :

    RMSE=1TPTPi(xixGTi)2+(yiyGTi)2+(zizGTi)2.(2)
    In the above, xi, yi and zi represent the coordinates of the molecules detected by the algorithm. xGTi, yGTi and zGTi represent the ground-truth coordinates.

    3. Results

    3.1. Time-consuming analysis and optimization of FD-DeepLoc

    FD-DeepLoc mainly consists of two parts: the output of the characteristic information matrix of neural network and the processing of converting the output characteristic map to localization data. Here, the compression of feature layer and pipeline programming were used to accelerate these two processes, and then the performance of the accelerated algorithm was tested. Firstly, this paper tested runtime of different parts of the FD-DeepLoc with simulated data under different filter configurations, the results are shown in Fig. 2(a). The experimental results show that with the decrease of the number of feature layers, the time spent in the neural network processing stage is obviously reduced, which may be due to the reduction of the number of filters leading to the reduction of the network forward time. Comparing the output of the algorithm under different parameters, it is found that there is no great difference in the number of localization points, which preliminarily proves that the method of reducing the network parameters does not affect the recognition of fluorescent molecular signals.

    Fig. 2.

    Fig. 2. Runtime test results (a) and localization error test results (b) of FD-DeeepLoc under different filter channel number configurations.

    Furthermore, this paper used 30 frames of simulated image data with the size of 2048×2048 pixels to quantify the performance of the model with reduced filter channels, and the result is shown in Fig. 2(b). In this set of simulated images, the average photon number of each emitter is 5000, and the average background value of each pixel is 50, which shows a medium SNR. The signal density in these simulated images is set to 1.5 emitters μm2. The experimental results show that with the decrease of the number of channels, the runtime of the model is greatly reduced. By calculating the localization error of FD-DeepLoc, it is found that when the number of filter channels is 6, the runtime can be reduced as much as possible without significantly decreasing the RMSE of the algorithm. In addition, in order to ensure the performance of the algorithm after feature parameter compression, this paper further optimizes the model with 6 filter channels by means of hyperparametric optimization. After 12 times of hyperparametric optimization training, the learning rate of the model is 0.0009, which shows the best localization performance. In this paper, the algorithm under this parameter setting is called Lite-FD-DeeepLoc.

    3.2. The performance of Lite-FD-DeepLoc

    Cramér–Rao lower bound (CRLB) is a key theoretical index to evaluate the localization accuracy of algorithm, which provides a theoretical limit of localization accuracy. In order to explore the adaptability of Lite-FD-DeepLoc to different degrees of aberration, the localization accuracy analysis was carried out in two regions of the 2048×2048 pixel image, namely, the central region with weak aberration and the edge region with strong aberration. The simulated parameters used to generate images for CRLB analysis in this section are consistent with those in Sec. 3.1.

    The CRLB results are shown by the solid line in Figs. 3(a) and 3(b). In the simulated image data, the localization accuracy of the three methods varies with the depth of Z-axis, as shown in Figs. 3(a) and 3(b). Specifically, stars, triangles and circles, respectively, represent the calculated values of different algorithms, and different colors represent the calculated values in different dimensions of x, y, z. It can be seen from the figure that the performance of Lite-FD-DeepLoc in x and y directions is quite close to that of FD-DeepLoc in the central region of the FOVs, and all of them are close to the CRLB, that is, close to the theoretical limit of localization accuracy. At the edge of the FOVs, the axial localization accuracy of Lite-FD-DeepLoc is slightly worse than that of FD-DeepLoc, but better than that of DECODE. This test result verifies the effectiveness of Lite-FD-DeepLoc and shows the feasibility of feature compression.

    Fig. 3.

    Fig. 3. Performance of the algorithm for locating simulated data. (a),(b) Calculation results of localization accuracy of edge region and central region of FOV; (c)–(e) 3D localization error test of the algorithm under different densities and SNRs; (f) Jaccard index corresponding to three algorithms under different densities and SNR. Here, low, medium and high SNRs correspond to an average 1000, 5000 and 10,000 photons per emitter and 10, 50 and 100 background photons per pixel, respectively. SNR.

    Furthermore, in order to test the accuracy of Lite-FD-DeepLoc, this paper calculated the RMSE and Jaccard index by using the simulated image with different SNRs and molecular densities. The Jaccard index and localization error of Lite-FD-DeepLoc and FD-DeepLoc and DECODE are shown in Figs. 3(c)–3(f). According to Fig. 3(e), the volume RMSE index of Lite-FD-DeepLoc is better than the corresponding index of DECODE and slightly worse than that of FD-DeepLoc. For example, when volume RMSE was tested by simulated images with a medium SNR level and a 1.5 emitters μm2 density, the result of Lite-FD-DeepLoc, DECODE and FD-DeepLoc was 41nm, 60nm and 35nm. In addition, under different SNRs and molecular densities, the Jaccard index of Lite-FD-DeepLoc are close to the corresponding values of the other two algorithms as shown in Fig. 3(f), which proves that the Jaccard index of Lite-FD-DeepLoc is still good.

    After confirming that the performance of Lite-FD-DeepLoc has not decreased significantly, this paper also tested the running speed of Lite-FD-DeepLoc with pipeline programming. Firstly, the running speed of Lite-FD-DeepLoc was tested by using 500 images with 128×128 pixels. During the experiment, it was found that the total running time is less than 5s, that is, the average time for processing a single frame image is less than 10ms, which is shorter than the camera exposure time (the conventional setting is 10-30ms in the SMLM). Therefore, this paper further increased the size of the test image to test the running speed of the three algorithms. During the test, the three algorithms, respectively, deal with datasets composed of multiple frames of 256×256 pixels size images. Each group of simulated tests was repeated five times, and the average value was taken as the runtime of the algorithms.

    Before comparing the running speeds of the three algorithms, this paper first tests the reliability of pipeline programming. Because it takes a lot of time to start parallel threads, the number of pipeline threads in this model would change according to the amount of experimental data. According to the experience, it is better to use two threads when the data amount is not more than 500 frames. When the data amount is more than 500 frames, the number of threads will also double with the increasing number of data. For example, when the data amount is 1000, 1500 and 2000 frames, the threads correspond to 4, 6 and 8, respectively. Figure 4(a) shows the speedup ratio of Lite-FD-DeepLoc with and without pipeline programming compared with FD-DeepLoc. The test results show that the running speed of Lite-FD-DeepLoc is further accelerated compared with FD-DeepLoc, and the speedup ratio is increased from 1 times to 1.25 times. Figure 4(b) shows the runtime of the three algorithms when processing different frames of images. It is not difficult to find that compared with the other two methods, the running speed of Lite-FD-DeepLoc is obviously improved, and the time it takes to process 50,000 frames of images is only 535.76s. According to this data, Lite-FD-DeepLoc needs 10.7ms to process an image, which almost meets the needs of real-time image processing.

    Fig. 4.

    Fig. 4. (a) Compared with FD-DeepLoc, the speedup ratio of Lite-FD-DeepLoc with and without pipeline programming. (b) Runtime of three algorithms when processing images with different frames.

    3.3. Astigmatism PSF engineering experimental data processing

    The experimental data of astigmatism PSF engineering were used to further test the localization performance of Lite-FD-DeepLoc. Specifically, the correlation information between the system PSF and the axial depth is extracted from the dataset of fluorescent beads to label different PSF signals with axial position. After that, the model was trained, and the number of training iterations was 30,000. In the model inference performance test, 100826 frames of raw image data with the size of 220×256 pixels were used, in which the Z-axis depth range of signal is ±700nm.

    The reconstructed image of Lite-FD-DeepLoc, FD-DeepLoc and DECODE are shown in Fig. 5. The global FRC resolutions of the reconstructed image are 37.9nm, 42.5nm and 48.5nm, respectively. It shows that the resolution performance of the reconstructed image obtained by Lite-FD-DeepLoc is not worse than the other two algorithms. In addition to the global resolution, the ring diameter (top view) and lateral structure size (side view) of the nuclear pore complex are calculated by intercepting the single nuclear pore structure in the local image. The results show that the typical ring diameter and interlayer distance of the nuclear pore complex reconstructed by Lite-FD-DeepLoc are 108.8nm and 54.0nm, respectively, this result is close to the corresponding values of 107.4nm and 49.8nm of Nup96 reported in the other paper.25 These results show that the performance of Lite-FD-DeepLoc is not inferior to FD-DeepLoc and DECODE in processing experimental data.

    Fig. 5.

    Fig. 5. Performance of Lite-FD-DeepLoc on experimental astigmatic 3D data of NPCs. (a) Overview of the 3D SRI of Nup96. b (1–3) are the zoomed images of the region indicated by the white solid box in (a) reconstructed by Lite-FD-DeepLoc, FD-DeepLoc and DECODE separately. b(1-i), b(1-ii), b(2-i), b(2-ii), b(3-i), b(3-ii) are the zoomed images of the rectangle region (i) and (ii) indicated in b(1), b(2) and b(3), respectively. c(1–3) and d (1–3) are the zoomed images of the region indicated by the white dot box in (a) reconstructed by Lite-FD-DeepLoc, FD-DeepLoc and DECODE separately. Scale bars, 5μm (a), 1μm (c1)–(c3), 1μm (d1)–(d3), 500nm (b1)–(b3) and 50nm (b(1-i), b(1-ii), b(2-i), b(2-ii), b(3-i), b(3-ii)).

    3.4. Saddle point PSF engineering experimental data processing

    This paper also uses saddle point PSF engineering experimental data to further test the localization performance of Lite-FD-DeepLoc. The training process of the model is the same as that in the data processing of the astigmatism experiment. In the model inference performance test, this paper uses 19631 frames of raw image data with a size of 300×332 pixels, in which the Z-axis depth range of signal is ±700nm.

    The reconstructed image of Lite-FD-DeepLoc, FD-DeepLoc and DECODE are shown in Fig. 6. The global FRC resolutions of the reconstructed image are 33.0nm, 34.2nm and 38.7nm, respectively. The typical ring diameter and interlayer distance of the nuclear pore complex reconstructed by Lite-FD-DeepLoc are 106.7nm and 53.0nm, respectively. These results once again verify that the performance of Lite-FD-DeepLoc is not inferior to FD-DeepLoc and DECODE in saddle point PSF engineering data processing.

    Fig. 6.

    Fig. 6. Performance of Lite-FD-DeepLoc on experimental Saddle point 3D data of NPCs. (a) 3D SRI of Nup96. b (1–3) are the zoomed images of the region indicated by the white solid box in (a) reconstructed by Lite-FD-DeepLoc, FD-DeepLoc and DECODE separately. b(1-i), b(1-ii), b(2-i), b(2-ii), b(3-i), b(3-ii) are the zoomed images of the rectangle region (i) and (ii) indicated in b(1), b(2) and b(3), respectively. c(1–3) and d (1–3) are the zoomed images of the region indicated by the white dot box in (a) reconstructed by Lite-FD-DeepLoc, FD-DeepLoc and DECODE separately. Scale bars, 5μm (a), 1μm (c1)–(c3), 1μm (d1)–(d3), 500nm (b1)–(b3) and 50nm (b(1-i), b(1-ii), b(2-i), b(2-ii), b(3-i), b(3-ii)).

    4. Discussion

    In this paper, a 3D single molecule localization algorithm named Lite-FD-DeepLoc was developed. In detail, based on FD-DeepLoc, Lite-FD-DeepLoc reduces the runtimes by feature compression and pipeline programming. According to the simulation experiment, it is not difficult to find that the running speed of Lite-FD-DeepLoc is more than double that of the original algorithm, as shown in Fig. 4. The results show that Lite-FD-DeepLoc only takes 10.7ms to process a single frame image of 256×256 pixels, which meets the requirements of real-time data processing. It is worth noting that the performance of the computer hardware used in this paper is slightly lower than that used in the FD-DeepLoc paper, so the hardware could be further upgraded to speed up data processing. In addition, the SNRs of test images are set at a medium level when calculating the running time, and the signal density is 1.5 emitters μm2, so we can further ensure the running speed of Lite-FD-DeepLoc by reducing the signal density and improving the SNRs.

    On the other hand, the resolution of the image reconstructed by Lite-FD-DeepLoc is not inferior to the original algorithm for the data processing of biological experiments. As shown in the results in Figs. 5 and 6, the global (FRC) resolution and the resolution represented by the resolvable structure of the image reconstructed by Lite-FD-DeepLoc are not worse than those of the other two algorithms. It is worth mentioning that Lite-FD-DeepLoc can be used to process the experimental data of different PSF engineering such as astigmatism and saddle point, and the reconstructed image quality is not inferior to other algorithms, as shown in Figs. 5 and 6. All these experimental results show that lite has good generalization performance and application potential.

    5. Conclusion

    To sum up, this paper develops a 3D single molecule localization algorithm based on deep learning: Lite-FD-DeepLoc, which solves the problem of slow running speed in 3D-SMLM and can meet the needs of real-time data processing. The test results of simulated data show that Lite-FD-DeepLoc only takes 10.7ms to process a single frame image of 256×256 pixels, which meets the requirements of real-time data processing. Moreover, in terms of astigmatism and saddle point PSF engineering experimental data processing, the image quality of SRI obtained by Lite-FD-DeepLoc is not inferior to other algorithms. Through many tests of processing simulation data and experimental data, we believe that Lite-FD-DeepLoc not only meets the demand of real-time data processing, but also opens a new direction for optimizing 3D single molecule localization algorithm based on deep learning, which is of great significance to the field of 3D SMLM.

    Acknowledgments

    This work was supported by the Start-up Fund from Hainan University (No. KYQD(ZR)-20077). We thank all the other members of the Digital Theranostics and Optical Microscopy (DigiTOM) group for their technical support.

    Conflicts of Interest

    The authors declare that there are no conflicts of interest relevant to this paper.

    ORCID

    Shuhao Guo  https://orcid.org/0009-0007-5015-8198

    Jiaxun Lin  https://orcid.org/0009-0002-1111-4145

    Yingjun Zhang  https://orcid.org/0000-0001-5703-3610

    Zhen-Li Huang  https://orcid.org/0000-0003-2400-966X