GPU-accelerated OCT imaging: Real-time data processing and artifact suppression for enhanced monitoring of 3D bioprinted tissues and vascular-like networks
Abstract
Optical coherence tomography (OCT) imaging technology has significant advantages in in situ and noninvasive monitoring of biological tissues. However, it still faces the following challenges: including data processing speed, image quality, and improvements in three-dimensional (3D) visualization effects. OCT technology, especially functional imaging techniques like optical coherence tomography angiography (OCTA), requires a long acquisition time and a large data size. Despite the substantial increase in the acquisition speed of swept source optical coherence tomography (SS-OCT), it still poses significant challenges for data processing. Additionally, during in situ acquisition, image artifacts resulting from interface reflections or strong reflections from biological tissues and culturing containers present obstacles to data visualization and further analysis. Firstly, a customized frequency domain filter with anti-banding suppression parameters was designed to suppress artifact noises. Then, this study proposed a graphics processing unit (GPU)-based real-time data processing pipeline for SS-OCT, achieving a measured line-process rate of 800kHz for 3D fast and high-quality data visualization. Furthermore, a GPU-based real-time data processing for CC-OCTA was integrated to acquire dynamic information. Moreover, a vascular-like network chip was prepared using extrusion-based 3D printing and sacrificial materials, with sacrificial material being printed at the desired vascular network locations and then removed to form the vascular-like network. OCTA imaging technology was used to monitor the progression of sacrificial material removal and vascular-like network formation. Therefore, GPU-based OCT enables real-time processing and visualization with artifact suppression, making it particularly suitable for in situ noninvasive longitudinal monitoring of 3D bioprinting tissue and vascular-like networks in microfluidic chips.
1. Introduction
Optical coherence tomography (OCT) is a noninvasive, high-resolution, high-sensitivity and three-dimensional (3D) imaging modality, which is widely used for clinical diagnosis and biomedical research,1,2 such as ophthalmic intraoperative navigation,3 cardiovascular imaging,4 biological tissue monitoring,5 and 3D bioprinting.6 Optical coherence tomography angiography (OCTA), as a functional extension of OCT, enables label-free 3D visualization of blood perfusion within the retina and cortex, enhancing our understanding of the pathophysiology and diagnosis of diseases.7,8,9,10 OCT, particularly OCTA, requires a long acquisition time and a large data size. With the development of light sources and acquisition equipment, the imaging speed of Fourier-domain optical coherence tomography (FD-OCT) has been continuously improved, making it the most widely used OCT technique at present. FD-OCT enables collecting massive signals in a short period of time, which brings new challenges to high-speed data processing and image quality improvement, especially for in situ diagnosis and monitoring.
This study mainly focuses on the practical challenges faced in online monitoring and in situ high-throughput imaging, such as the online monitoring of 3D bioprinting6,11 and in situ high-throughput imaging of biological tissues such as artificial skin,12 multicellular tumor spheroids and organoids in well plates in well plates.13,14 In situ Online monitoring of 3D printing requires timely acquisition of 3D data of the printed structure during the printing process. Then feedback information such as defect detection or printing parameter adjustment is desired based on qualitative or quantitative evaluation of 3D data. Therefore, high-speed imaging is one of the prerequisites for online monitoring and timely feedback. In the in situ high-throughput imaging of biological tissues, it is necessary to control the total imaging time of the biological tissue samples to avoid that the biological tissue samples are out of the suitable culture conditions for too long, which affects their growth status and activity and prevents them from being cultured and observed for a long period of time. For the above in situ imaging, the presence of image artifact noises proves challenging to remove, thus affecting both visualization quality and the accuracy of quantitative analysis. Therefore, high imaging speed and high image quality are inevitable requirements for OCT and OCTA to be used for online monitoring of 3D bioprinting and in situ high-throughput imaging of biological tissues, and they also serve as the foundation for real-time feedback and precise quantitative analysis tools.
For online monitoring and in situ high-throughput imaging, reflections from the interfaces of biological samples or culture plates may result in numerous image artifacts, manifesting as nonperiodic vertical stripes. These artifacts degrade the reliability of internal information of biological tissues. Traditionally, the artifacts can be mitigated by reducing the optical power in the sample or reference arm. However, this approach is not suitable for weakly scattering biological samples, such as 3D-printed cell-laden hydrogel scaffolds.15 In addition, the method of adjusting the incident light angle or the depth position is not suitable for the high-throughput imaging for samples in the U-shaped well plate. Besides, even small angular tilts can result in the loss of optimal imaging regions for the objective sample, making it challenging to obtain large-field, high-quality 3D images.16,17 Lastly, other methods which may change the culturing environment or apply pressure with glass slides are not expected for online monitoring and in situ high-throughput imaging. Combined with the above information, this study aims to reduce the impact of noise on the accuracy of visualization and subsequent image analysis when artifact noise cannot be fully removed.
There are two primary research approaches to suppress stripe artifacts, which are based on traditional mathematics and deep learning. In traditional mathematical methods, the frequency-domain bandpass filters18,19 attenuate the frequencies perpendicular to the stripes while preserves coefficients near the origin to suppress stripe artifacts. However, one limitation of these approaches is that image details are prone to being deleted alongside the detrimental stripes, causing a loss of underlying image information. Low-rank constraint models20,21 leverage the characteristic of stripe matrices having a low rank to remove artifacts by constraining the rank of the image matrix. Nevertheless, it is important to note that the artifacts in OCT images may also contain high-rank components. In deep learning approaches, Guan et al.22 used wavelet neural networks to suppress stripe noise in infrared imaging systems, assuming the availability of corresponding noise-free image datasets. However, manually added stripe noise may differ from the real OCT artifact noise. Gao et al.23 introduced a multiple-output layer in neural network to simultaneously estimate clean and noisy images, eliminating the need for noise-free ground truth images. Hu et al.24 built the experimental and simulated datasets based on the generation mechanism of saturation artifacts, and enhanced super-resolution generative adversarial networks were trained by the clear-saturated phantom image pairs. Ji et al.25 proposed a novel image inpainting model based on a generative model is proposed to recover degraded regions in OCT images, especially wide saturation artifacts. Li et al.26 proposed an inpainting method that introduces structural similarity and frequency loss is proposed to remove stripe-shaped saturation artifact for cornea segmentation and subsequent clinical analysis. Recently, the current methods based on deep learning can achieve better artifact elimination effects than simple bandpass Gaussian filtering, especially with corresponding datasets. However, deep learning-based methods suffer from low efficiency and high graphics processing unit (GPU) resource utilization, making it challenging to achieve rapid denoising and visualization for online monitoring and in situ high-throughput imaging.27 Considering the differences in artifact noise between background and signal regions, this paper proposes a customized frequency-domain filter to attenuate nonstandard stripe artifacts, ensuring rapid and effective removal of artifact noise.
In the past two decades, there is remarkable progress in the data acquisition speed of swept-source optical coherence tomography (SS-OCT) from the initial 20–50kHz A-scan rates to the current MHz level,28,29,30,31 bringing new challenges for high-speed data processing and visualization. Liu et al.32 achieved real-time processing speed of 20k A-line/s using multi-core central processing units (CPU) for raw spectral data, but it was only applicable to small-size OCT data with a frame size of 512×400. Ustun et al.33 implemented a real-time B-frame display rate of 27 frames per second (fps) with a frame size of 1024×512 with FPGA, but there are challenges in elevated hardware expenses and substantial programming complexities. With more efficient parallel computing capabilities, GPU has been employed to further enhance OCT data processing, transmission and visualization efficiency. Zhang et al.34 proposed a GPU-based ultra-fast linear interpolation algorithm for SD-OCT, achieving a volumetric refresh rate of 10Hz with a volume size of 512×125×100. Rasakanthan et al.35 improved data transfer efficiency using GPU page-locked memory and achieved a processing and display speed of 524k A-line/s with a line size of 1024. Lyu et al.36 reduced data transfer time with GPU and dual CUDA streams, achieving real-time B-frame processing rate of 200fps with a frame size of 1024×1024. Kolb et al.37 using dual GPUs, achieved a volumetric refresh rate of approximately 20Hz a volume size of 330×330×595. To address the requirements of denoising and high-speed data processing, we develop an integrated algorithmic workflow accelerated by GPU for 3D OCT data processing, denoising and visualization, capable of matching the speed of an 800k A-line/s OCT system. Deng et al.38 proposed a GPU-based real-time data processing pipeline with a measured line-process rate of 133kHz for displaying OCT and OCTA cross-sections in real time. Expanding on this foundation, we devise a GPU-accelerated OCTA algorithm to enhance the extraction speed of 3D microvascular information.
This study addresses the practical demands of online monitoring and in situ high-throughput imaging by proposing algorithms for artifact suppression and high-speed data processing. Firstly, we design an artifact filter that introduces an anti-stripe suppression factor in the frequency-domain filter to preserve valid signals below the noise level in the signal region. Secondly, we develop the high-speed reconstruction algorithms for OCT signals and OCTA signals based on GPU-CUDA multi-threading technology. Finally, we accelerate volume rendering through improved ray projection using GPU-OpenGL. By combining these algorithms, we develop an integrated algorithmic workflow with GPU that encompasses OCT and OCTA signal acquisition, processing, denoising, and visualization. Lastly, we validate the effectiveness of this algorithmic workflow with 3D printed biological tissues and vascular-like networks.
2. Materials and Methods
2.1. Experimental equipment
In this study, a custom-built SS-OCT system was utilized11,13 which was integrated with 3D bioprinter (Regenovo Bio-Architect PX, Hangzhou Regenovo Biotechnology Co., LTD.). A swept source with a central wavelength of 1310nm, a bandwidth of 105nm, and a scanning frequency of 50kHz was used. The power of the swept source on the sample can be adjusted from 0.5mW to 8mW. The axial resolution measured in air was 7.2μm, and the lateral resolution was 15.6μm. For a single 3D scan, a maximum scanning range of 10mm(x)×10mm(y)×6.00mm(z)mm3 is achieved. The OCT probe was integrated with a three-axis precision displacement module, facilitating precise positioning for on-line monitoring or in situ high-throughput rapid imaging. For 3D OCT imaging, each A-line consists of 1280 sampling points. Each B-frame comprises 1024 A-lines, and each 3D OCT volume dataset contains 1024 B-frames.
The custom-designed microfluidic chip consists of two chambers with size of 5mm×5mm×1mm. Both the top and bottom covers of the chip are made of transparent polymethyl methacrylate (PMMA), enabling in situ monitoring by OCT after sealing. The middle layer, which includes the chambers and channel layers, is made of polydimethylsiloxane (PDMS). Four screws are used for the encapsulation of the chip for reuse. The multi-material bioprinting was employed to prepare tissue containing a 3D vascular-like network using the aforementioned extrusion-based 3D printer.
Firstly, 5% gelatin was chosen as fugitive ink to print the vascular-like network, which served as the sacrificial material to remove later (see Fig. 1(a)). Secondly, 5% gelatin and 1% sodium alginate were selected as tissue ink to print the tissue bed layer by layer (see Fig. 1(b)). Then 5% gelatin and 1% sodium alginate was cross-linked with 5% CaCl2 before the microfluidic chip package (see Fig. 1(c)). Then, the fugitive ink was removed from the thick tissue by heating to ≈37∘, whereby it undergoes a gel-to-fluid transition. During removal (see Fig. 4(d)), real-time OCTA imaging was performed to monitor the formation process of the 3D vascular-like network. For 3D OCTA imaging, each B-frame comprises 512 A-lines and five repeated B-frames were acquired at the same location in the y-direction. Each 3D OCT volume dataset contains 5×512 B-frames.

Fig. 1. Schematic representation of the microfluidic chip and the vascular-like network. (a) 3D bioprinting of the vascular-like network using the fugitive ink; (b) 3D bioprinting of the tissue bed using tissue ink; (c) encapsulating the microfluidic chip using top and bottom covers; and (d) removal of fugitive ink.
2.2. Nonstandard stripe artifacts filter
In OCT images, speckle noise39,40 and artifact noise20,23 can degrade the image quality. Speckle noise is a standard type of Gaussian-like noise and can be reduced using methods such as Gaussian filters, mean filters, and deep learning methods.39,41,42 On the other hand, artifact noise caused by strong surface reflections from the sample differs from standard stripe noise in camera images.43 For the standard stripe noise, the noise intensity remains relatively consistent across the entire length of the stripe. For OCT images, the artifact noise is inconsistent in the direction of the stripe, especially the background region (BGR) and the region of interest (ROI) are obviously inconsistent. In BGR, the artifact noise resembles standard stripe noise, but the difference lies in the nonzero variance. In ROI, useful signals of target samples are also concealed under this nonstandard strip artifact. In this study, the artifact noise is defined as the nonstandard stripe noise, according to the difference between it with the standard stripe noise.
Figure 2(a) shows the challenges of OCT images with nonstandard stripe artifact using two different methods based on conventional mathematical methods and deep learning. In Fig. 2(a1), standard stripe noise is added artificially,22 and Figs. 2(a2) and 2(a3) are results using Gaussian bandpass filtering and a neural network, respectively. Both approaches perform quite well in standard stripe noise. In Fig. 2(a4), a real OCT image with nonstandard stripe artifact is shown, and the denoising results with the same filtering methods are displayed in Figs. 2(a5) and 2(a6). These results demonstrate stripe artifact is effective suppressed only in BGR. However, the stripe artifact changed to dark stripes from bright stripes in ROI, resulting in the loss of valuable signal information of target sample.

Fig. 2. Nonstandard stripe artifact attenuation flow chart. (a) Problems of current denoising methods; (b) 3D diagram of the filter proposed in this paper; (c) filtering results with different values of the filtering parameter θ, then select BGR and ROI for SNS calculation; (d) flowchart for quantization of filtering parameters. SNS: Signal-to-Noise Strength; θ: stripe suppression ratio (%); (e) Filtering result corresponding to the best point.
The vertical stripes manifest as a bright horizontal stripe in the frequency domain, and the conventional methods indiscriminately suppress the fundamental frequency information, leading to the dark stripes in ROI. To address this issue, a frequency-domain bandpass filter with a counter-suppression parameter was introduced, and 3D filter image is shown in Fig. 2(b) and displayed in the following equation:
The filter optimization process is illustrated in Figs. 2(c)–2(e). In Fig. 2(c), a series of filtered images are displayed with different suppression parameters θ applied (corresponding filters see Fig. S1). Regions in the blue box and the red box were selected as BGR and ROI, respectively. Then SNS values of BGR and ROI are computed and plotted in Fig. 2(d) with the blue curve (SNSBGR) and the red curve (SNSROI), respectively. In Fig. 2(d), the horizontal axis represents the suppression parameter θ, while the vertical axis represents the SNS values. In 3D rendering, the artifact in the background is not conducive to the visualization of the target, while the artifact in the target obsesses the target details. So different evaluation criteria are developed according to the noise level in the background and target area. First of all, for the background, the signal mean and variance both should be relatively small. Based on the statistics of system noiseless, this paper sets the acceptable SNS in the background not higher than SNSBGR_ref. Secondly, for the target, it is expected to remove the artifact while preserving the target signal under the artifact as much as possible. To avoid the situation where artifacts are excessively processed into dark stripes, the SNS level before filtering was calculated (SNSROI_ref) and the acceptable range in the target should be between 99%SNSROI_ref and 101%SNSROI_ref. When the noise intensity in the BGR falls below this reference value BGRREF, the filtering effect in BGR is considered acceptable.
Overall, the stripe suppression parameters are traversed sequentially (from 0 to 100%). When the condition (SNSBGR≤SNSBGR_ref and 99%SNSROI_ref≤SNSROI≤101%SNSROI_ref) was met, stop traversing and use the optimal filtering result as the output (Fig. 2(e)).
2.3. Software frameworks
The developed GPU SS-OCT software framework is depicted in the flowchart shown in Fig. 3(a). Three CPU threads are established to control the entire algorithmic flow, including data acquisition thread (Thread 1), data processing Thread (Thread 2), and volume rendering thread (Thread 3). Thread 1 is primarily responsible for acquiring raw spectral data from the data acquisition card (ADC). The raw spectral data are loaded into the host memory in a segmented manner with assigned unique IDs. Once Thread 2 detects that spectral data are available in the host memory, it immediately transfers the spectral data to the GPU memory for processing. After processing, OCT images are transmitted into pre-allocated 3D textures. Finally, Thread 3 employs OpenGL to perform 3D volume rendering for the processed data.

Fig. 3. Software framework of SSOCT system. (a) Overall software framework diagram; (b) GPU algorithm processing flowchart; (c) artifact denoising module and OCTA module.
Figure 3(b) provides a detailed illustration of the GPU-accelerated algorithm processing workflow. Among them, Module A utilizes the denoising module in Fig. 3(c), which is the CUDA implementation of the artifact noise attenuation with the customed filter proposed in Sec. 2.2. It is encapsulated in the form of a flag and CUDA kernel function to attenuate nonstandard strip artifact caused by strong reflections from samples. Module B utilizes a combination of the OCTA module and the denoising module in Fig. 3(c). The OCTA module is a dynamic blood flow extraction algorithm based on the complex cross-correlation (CC) algorithm proposed by Guo et al.44 Optimally, it achieves dynamic region extraction in OCT by computing the difference between the current CC value and the CC value of background environment. The module primarily relies on four kernel functions: CC preprocessing, convolution, CC post-processing, and threshold segmentation. Finally, it employs the connected denoising module to remove artifact noise.
In Thread 2, the GPU-based processing workflow is executed in the form of two CUDA streams for real-time reconstruction algorithm as illustrated in Fig. 3(b). Each CUDA stream sequentially performs kernel functions in a queued manner, including DC term removal, spectral shaping, phase compensation, Fourier transform, optical path compensation (selected for large-view imaging) and texture mapping kernel functions. When Module A is enabled, the corresponding program block is inserted in the CUDA stream queue, the entire system executes the 3D-OCT algorithm for high-speed real-time 3D reconstruction of OCT signals. When Module B is enabled, the optimized complex cross-correlation difference OCTA (CC-OCTA) algorithm is executed for high-speed, real-time 3D reconstruction of OCTA. The final texture mapping in CUDA streams serves as shared resource between CUDA and OpenGL, allowing that CUDA processing results can be directly utilized by OpenGL, thereby saving time on texture data transfer. Dual CUDA streams enables data transfer and kernel function execution to proceed in parallel, thus reducing data transfer time. For small-volume data, the kernels of two CUDA streams can also be executed in parallel. However, for large-volume data, the limited GPU memory can lead to an increased number of data blocks, which may slow down overall processing speed. Therefore, only the data transfer and kernel function execution are overlapped to shorten the time for data transfer. To improve the speed of data processing as much as possible, the optimal cache allocation scheme is determined based on the available GPU space to minimize the block data, improve the cache reuse efficiency and GPU resource utilization, and improve the final computing efficiency.
In Thread 3, the GPU-based real-time volume rendering workflow utilizes both vertex and fragment shaders for rendering. The vertex shader is responsible for vertex calculations and projection transformations of the model. The fragment shader calculates the color of each pixel in the model, which used for the volume rendering algorithm. Ray casting is the most widely used volume rendering algorithm, and provides high image quality but involves significant computational demands.45,46,47 To improve rendering speed, two acceleration techniques of both collision detection and skip sampling are incorporated in the Rendering Pipeline in Fig. 3(b), based on the characteristics of the background area in OCT data.
3. Results
3.1. Denoising of OCT stripe artifacts
In Sec. 2.2, a frequency domain filter was designed to attenuate nonstandard strip artifact, which can be seamlessly integrated into the GPU-based processing pipeline. Representative OCT images were selected during in situ high-throughout imaging of 3D bioprinted biological tissues to assess the performance of the proposed filter in attenuating artifacts, as shown in Figs. 4(a1)–4(d1). An OCT image was artificially added to standard stripe noise for method comparison, as shown in Fig. 4(e1). For each image, SNS values for both BGR and ROI areas were provided an inserted in the images. Figures 4(a2)–4(e2) are the filtering results of the original OCT image using the classic Gaussian bandpass filter (GPB).44,48 The deep-unfolding for iterative noise removal (DINR) algorithm49 is the known best-performing model that employs neural networks for stripe noise removal, and its filtering results are shown in Figs. 4(a3)–4(e3). Figures 4(a4)–4(d4) show the denoising effect with the designed filter in this study (OURS).

Fig. 4. Nonstandard stripe artifact attenuation results of 3D bioprinted scaffolds and biological tissues. (a1)–(e1): OCT images with stripe artifacts for different samples; (a2)–(e2): results of GPB; (a3)–(e3): results of DINR neural network; (a4)–(e4): results of the nonstandard stripe filter proposed in this paper (OURS). SNSROI and SNSBGR indicate Signal-to-Noise Strength (SNS) values in the BGR and the ROI.
Compared with GPB, DINR and OURS methods in Fig. 4, the stripe artifacts in BGR areas were all suppressed effectively, and SNS values can achieve the requirements of noise intensity. When the artifact noise l is weak in ROI areas (in Fig. 4(b)) or when the noise signal is standard (in Fig. 4(d)), all methods of GPB, DINR and OURS can meet the requirements of artifact attenuation. However, when the signal region contains artifact noise with relatively high intensity as seen in Figs. 4(a), 4(c), and 4(d), there were dark stripes results in the results with the GPB and DINR methods, and there were significant differences for ROISNS from those before filtering. In contrast, Both the stripe artifacts were effectively attenuated with our proposed filtering method. BGRSNS decreased and ROISNS fluctuated within a range of ±1% compared to the initial values. Our method can attenuate stripe artifact and effectively reserve the target signal under stripe artifact in ROI, which surpasses both GPB and DINR methods in performance.
3.2. Data processing speed and visualization
The customed GPU parallel processing programs was developed using CUDA version 11.6, on a 64-bit Windows 10 operating system, equipped with NVIDIA GeForce RTX 2060 GPU, Intel(R) W-2235 CPU @3.8GHz CPU and 32.0GB RAM. To demonstrate the real-time performance of the GPU-accelerated algorithm proposed in Sec. 2.3, NVIDIA’s official Nsight System software was used for program analysis. As shown in Fig. 5(a), 3D spectral dataset with the size of 1280×1024×1024 was processed. The CUDA data transfer stream and processing stream correspond to Stream14 and Stream13, and the OPENGL volume rendering pipeline corresponds to OPENGL HW. These three processing streams run in parallel as designed. In Stream14, the data are divided into 8 blocks and transferred to the GPU for processing sequentially, corresponding to 8 data transfers. The overall execution time was 1.242s (corresponding to 824k A-line/s), as labeled by the red bidirectional arrow in Fig. 5(a).

Fig. 5. The RTX2060 timeline view; (a) overall body data processing flowchart; (b) processing flowchart for each module.
In the first block as labeled by left red bidirectional arrow in Fig. 5(b), raw spectral data transfer utilizing the PCIe bus of the graphics card, and it takes approximately 46ms. Except the first block, all blocks overlap with the execution of kernel functions as labeled by the second red bidirectional arrow in Fig. 5(b), and this takes approximately 136ms. In OPENGL HW, the OPENGL rendering pipeline is executed after the completion of each processing stream. As indicated by the purple arrow in Fig. 5(a), during the execution of the second data stream, the rendering pipeline can be observed being triggered.
The processing rates of GPU-accelerated 3D OCT system are presented in Table 1 (3D-OCT mode), including our results and other results.34,47,50,51 The floating-point performance of GPUs is ranked as follows: RTX 4090-24G>RTX Titan X-12G>GTX 1080 Ti-11G>RTX 2060-12G>RTX 2060-6G>FX 5800-4G. Except for the RTX 4090, the performance of other GPU models falls within the range of mid-range graphics cards. It can be observed that our algorithm performs well on mid-range graphics cards, suitable for OCT systems with a capacity of 750–850k A-line/s. With high-end graphics cards such as RTX4090, without algorithm optimization, the speed up to 2 million A-line/s can be achieved. It is anticipated that even higher speeds can be achieved with further optimization. In conclusion, our algorithm has achieved higher processing speeds by utilizing GPUs with lower computational power, which holds significant practical value.
Source | Depth scan size [Z] | Frame size [X∗Y] | Compute device [NVIDIA] | Compute capability [TFLOPS] | Processing rate [A-line/s] |
---|---|---|---|---|---|
Zhang et al.34 | 512 | 250×160 | FX5800-4G | 0.60 | 100K |
Wei et al.50 | 512 | 2048×3 | GTX1080Ti-11G | 10.69 | 128K |
Chen et al.51 | 2048 | 1000×1 | RTX2060-12G | 7.81 | 500K |
Jeong et al.47 | 1024 | 256×100 | Titan X-12G | 6.10 | 540K |
Present paper | 1280 | 1024×1024 | RTX2060-6G | 6.45 | 820K |
1280 | 1024×1024 | RTX2060-12G | 7.18 | 900K | |
1280 | 1024×1024 | RTX4090-24G | 82.58 | 2.6M |
When OCTA module is activated, the OCTA data size is 1280(z)×512(x)×4(repeat number)×1024(y), maintaining consistency with the overall data size of 3D OCT. As shown in Table 2, the time consumption for various modules was displayed, including OCT Algorithm, Rendering Pipeline, Denoising Module and OCTA Module. The results indicate that on the RTX2060-6G, a graphics card in the mid-to-low range, the proposed data processing pipeline can handle and render 3D-OCT data within 1.242s, boasting a maximum OCT data-processing efficiency of 800k A-line/s. Furthermore, when the OCTA module is engaged, it can complete both OCT and OCTA data processing and rendering in 1.665s, attaining a processing rate of 600k A-line/s for OCTA data. The proposed data processing pipeline surpasses current research by achieving higher speeds with relatively lower computing requirements.
Compute device | OCT algorithm | Rendering pipeline | Denoising module | OCTA module |
---|---|---|---|---|
RTX2060-6G | 1.171s | +0.071s | +0.112s | +0.423s |
RTX3060-6G | 1.085s | +0.062s | +0.092s | +0.378s |
RTX4000-8G | 1.474s | +0.158s | +0.100s | +0.427s |
Besides high-speed imaging, the high quality of images and 3D volume rendering images is also a necessary requirement. The proposed process pipeline can achieve high quality rendering effect at the same time of high-speed processing, as shown in Fig. 6.

Fig. 6. 3D volume rendering results of different samples. (a) A multicellular tumor sphere (without denoising module); (b) the same multicellular tumor sphere (with denoising module); (c) an artificial skin (without denoising module); (d) the same artificial skin (with denoising module); (e) a cell-laden hydrogel scaffold sample; (f) the organoids in matrigel; (g) a spiral flow channel structure of OCT; and (h) OCTA result of the spiral flow channel.
In Fig. 6(a), a multicellular spheroid (MTS) was monitored in situ in the U-shaped 96-well plate and the U-shaped bottom leads to strong columnar artifact at the edge of MTS (see the red arrow). With the proposed artifact attenuation method (θ=25%), the columnar artifact was attenuated significantly as shown in Fig. 6(b). In Fig. 6(c), an artificial skin was monitored in situ under gas-liquid culture environment and there are many artifacts coming with artificial skin with flat surface. After artifact attenuation (θ=5%), almost all artifacts were attenuated effectively as shown in Fig. 6(d). Further, a cell-laden scaffold in Fig. 6(e) and the organoids in matrigel in Fig. 6(f) were displayed for rendering effect for online monitoring of printing process and in situ high-throughout monitoring during the long-term culture process. A spiral flow channel phantom was prepared and imaged to illustrate the OCTA algorithm and rendering effects. Figure 6(g) displayed the OCT structure results and the channel used the silicone tube with the inner diameter of 300μm and the outer diameter of 500μm. Inside the silicone tube, 5% intralipid was used as flow phantom and the inner fluid was extracted with OCTA algorithm as shown in Fig. 6(h).
So far, the performance of the proposed entire process pipeline has been displayed for the rapid visualization of large-volume OCT spectral data. And the overall performance of the algorithm has a significant advantage among the current studies.
3.3. Vascular-like network constructing and monitoring
Microfluidic technique is used for perfusion culture of cells combining the constructing vascular-like network of microfluidic chips and micro-pumps, aiming to improve cell proliferation and development.52,53 The construction quality of the vascular-like network is a crucial indicator for assessing the perfection of the microfluidic chips. In this section, with the GPU-accelerated CC-OCTA algorithm proposed in Sec. 2.3, the formation process vascular-like network was monitored in real time along with the removal of fugitive material.
As shown in Fig. 7, CC-OCTA data were acquired, processed and rendered at three timepoints during the removal process of fugitive material, including before removal, partly removal and after removal. Figures 7(a)–7(c) are the cross-sections (XZ) of the flow channel at three timepoints, corresponding to the red lines in the enfaces (XY) of the flow channel in Figs. 7(d)–7(f). Figures 7(g) and 7(f) are 3D renderings of the flow channel. The blue channel indicated the static material (i.e., the fugitive material before removal), and the orange channel indicated the dynamic liquid (i.e., the perfused culture medium with contrast particles). To our knowledge, this is the first time that the formation process of microchannels has been monitored using GPU acceleration in real time. Through high-quality OCTA images, it provides accurate and effective guidance and assessment tools for the monitoring and instruction of vascular-like network.

Fig. 7. OCTA images of a microfluidic chip flow channel at three moments, including before removal, partially removal and after removal of the fugitive material. (a)–(c) Cross-sections (XZ) of the flow channel; (d)–(f) enfaces (XY) of the flow channel; (g)–(i) 3D renderings of the flow channel. The blue channel indicates the static material, and the orange channel indicates the dynamic liquid.
4. Discussion
To effectively attenuate strip artifact in OCT images, a frequency domain artifact filtering algorithm with anti-suppression parameters was proposed. In this algorithm, the bandpass component was used to attenuate periodic stripe artifacts, while the stripe suppression and anti-suppression components are employed to reduce nonstandard stripe artifacts. The filtered results exhibited a significant reduction in both periodic and nonstandard stripe artifacts, and there was little noticeable degradation of the ROI signal underlying artifact noise. Importantly, this algorithm requires no modifications to OCT system, and it can be efficiently implemented on GPU-CUDA with low memory usage and high speed. In our filtering experience, when the reference value for background BGRREF was set as 35% and the fluctuation range of ROISNS was set as ±1%, the noise attenuation effects are all acceptable for different systems. We recommend these above settings as the initial denoising parameters for further parameter optimization. Additionally, due to stray reflections from optical surfaces,54,55 there are horizontal coherence artifacts characterized as horizontal coherence stripes. To address such artifact noise, simply rotating the filter’s stripe suppression and anti-suppression components clockwise by 90∘ is sufficient.
It should be noted that the filtering method proposed for stripe artifact attenuation belongs to a semi-automatic design. Firstly, optimal parameter selection can be conducted based on user-specified background and target regions. In this paper, the representative background and target regions are manually selected by the user, as indicated by the blue and red rectangular boxes. However, methods based on morphology or deep learning can also be employed for target segmentation to obtain global background and target regions. Secondly, for the same type of samples, the filter parameters are generally consistent. It is only necessary to execute the filter’s optimal parameter determination algorithm once, which involves iterating through the filter parameters and determining the optimal filter parameters using the conditions described in Sec. 2.2. This filter can then be used for denoising subsequent similar samples, reducing time overhead.
To achieve a high-speed, 3D real-time visualization for OCT system, 3D-OCT algorithm and CC-OCTA algorithm using GPU-CUDA were designed. 3D-OCT algorithm is used for extracting OCT signals, while CC-OCTA algorithm was employed for dynamic signal extraction. Additionally, an improved ray-casting volume rendering algorithm was developed using GPU-OpenGL for final visualization. The entire algorithm pipeline is implemented on GPU, utilizing dual overleaping CUDA streams and cache reuse to enhance efficiency. Phase compensation and the filter denoising algorithm are used to improve image quality. The processing speed of the 3D-OCT algorithm was analyzed on mid-range GPUs, such as RTX2060-6G, and a real-time processing rate of 800k Aline/s was achieved including reconstruction, denoising and rendering. It is anticipated that even higher speeds can be achieved by reducing the volume of raw spectral data or upgrading the GPU card level. Without optimization of the algorithm, high-performance GPUs, such as RTX4090 enable accommodating MHz OCT systems, which have broad applications in high-speed reconstruction of large-volume OCT data.
As a noninvasive tissue imaging technique, OCTA provides an effective means of imaging for monitoring the preparation process of vascular-like networks. Vascular-like networks can be constructed using methods such as biomimetic materials and 3D printing, and hold significant importance in the field of biomedical research.56,57 Based on 3D bioprinting and multi-material printing, tissues containing complex vascular network can be prepared. During the fugitive material removal, the designed vascular-like network was performed. And the formation process can be monitored for assessment the 3D network formation and instructing the network design. Furthermore, the fundamental algorithm for Spectral Domain OCT(SD-OCT) was also implemented. The difference is that the accelerated algorithm for SD-OCT includes additional steps such as spline interpolation. The CC-OCTA algorithm, denoising, and volume rendering are consistent with SS-OCT.
It should be pointed out that the system in our laboratory used a 50kHz swept source (Santec HSL-20-50-S), and we also purchased a 200kHz sweeping light source (Thorlabs SL131161 MEMS-VCSEL) a few months ago. Moreover, researchers have proposed 325kHz,58 400kHz59 and 9.4MHz60 A-line rate SSOCT system in recent years. In addition to the functional imaging technology such as OCTA proposed in this paper, the improvement of the performance of optoelectronic devices also puts forward new requirements for data processing speed. Therefore, the high-speed signal processing flow proposed in this paper aimed to realize real-time imaging on higher speed OCT systems, and is not only limited to our existed 50kHz sweep OCT systems.
5. Conclusion
In summary, this study presented a comprehensive GPU-based 3D rapid visualization SS-OCT algorithm. It primarily consisted of four main components: spectral reconstruction, artifact denoising, volume rendering, and OCTA module. This algorithm provided a reliable, high-quality, and flexible software framework for high-speed OCT systems, and OCTA module offered a monitoring tool for the preparation of vascular-like networks.
Acknowledgments
This work was supported by the National Key Research and Development Program of China (Nos. 2022YFA1104600 and 2022YFA1200208), National Natural Science Foundation of China (No. 31927801), Key Research and Development Foundation of Zhejiang Province (No. 2022C01123). Shanshan Yang and Jinhao Zhou contributed equally to this work.
Supplemental Materials
The Supplemental Materials are available at: https://www.worldscientific.com/doi/suppl/10.1142/S1793545824500135.
Conflicts of Interest
The authors declare no conflicts of interest.
ORCID
Shanshan Yang https://orcid.org/0000-0002-8147-9337