World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Classification of hyperspectral images for detection of hepatic carcinoma cells based on spectral–spatial features of nucleus

    https://doi.org/10.1142/S1793545820500029Cited by:10 (Source: Crossref)

    Abstract

    A distinguishing characteristic of normal and cancer cells is the difference in their nuclear chromatin content and distribution. This difference can be revealed by the transmission spectra of nuclei stained with a pH-sensitive stain. Here, we used hematoxylin–eosin (HE) to stain hepatic carcinoma tissues and obtained spectral–spatial data from their nuclei using hyperspectral microscopy. The transmission spectra of the nuclei were then used to train a support vector machine (SVM) model for cell classification. Especially, we found that the chromatin distribution in cancer cells is more uniform, because of which the correlation coefficients for the spectra at different points in their nuclei are higher. Consequently, we exploited this feature to improve the SVM model. The sensitivity and specificity for the identification of cancer cells could be increased to 99% and 98%, respectively. We also designed an image-processing method for the extraction of information from cell nuclei to automate the identification process.

    1. Introduction

    Existing data suggest that, on average, approximately 14.1 million new cancer cases are registered each year1 and more than 90 million individuals are diagnosed with cancer all over the world.2 Because of its high metastasis rate, cancer is very difficult to prevent and cure. Until 2017, the mortality rate of cancer was greater than 17.5%.3 Various methods have been developed to image tumors in the human body, such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), to name a few. Moreover, biopsy samples and specific microscopic technologies are also needed to analyze the characteristics of the tumors at cellular level and identify whether they are cancerous or not. Usually, researchers or doctors identify cancer cells based on their morphologic features using a microscope. Though this is the most convenient and intuitive way of classifying normal/cancer cells, the accuracy of this classification process depends to a very large extent on the personal experience of the classifier. In recent years, increasingly advanced technologies have been used for cell classification, such as laser scanning confocal microscopy,4 electron microscopy5 and coherent anti-Stokes Raman scattering.6 These methods can yield significant quantitative information from biopsy samples. However, they remain to be optimized in terms of cost and complexity.

    Hyperspectral imaging (HSI) is an alternative method for biological detection7 and has the advantage of simultaneously providing spectral and spatial data related to the imaged sample. Hence, it is already being used for the identification and analysis of cancer cells.8,9,10,11 Usually, we collect the spectral information of the normal cells and the cancer cells independently with an HSI system to find out the differences of them for the classification. However, one major limitation of HSI is the difficulty of obtaining effective information from biopsy samples. In most cases, the slides of human tissues used for imaging do not exhibit strong self-fluorescence and are so thin that they can barely absorb any of the light propagating through them. Therefore, it is difficult to obtain their fluorescence/transmission spectra. In most previous studies on the imaging of cancer cells using HSI, a staining treatment had to be performed on the samples.12,13,14 However, the specificity of the spectra of stained samples is relatively low. When stained under the same conditions, normal and cancer cells exhibit similar spectral responses, making them indistinguishable. Even if one uses the immunofluorescence technique to stain only the cancer cells, some normal cells are also inevitably stained and may end up being labeled as cancer cells.15 Thus, the differences in their spectra are too small to allow for classification using a parametric method. Hence, an adaptive algorithm, such as the Fisher algorithm,16 back propagation neural network,17 or support vector machine (SVM),18 must be used to determine the optimal criterion for automatic classification of the hyperspectral data. Being nonparametric methods, adaptive algorithms can sometimes wrongly treat the individual differences within samples as differences between normal and cancer cells during the training process. This can adversely affect the classification process when the individual cell differences are more significant than those between normal and cancer cells.

    Obviously, selecting the proper stain that induces spectral characteristics that reflect the differences that exist between normal and cancer cell is an effective way of limiting the effects of the individual differences. Medical research has shown that the chromatin content of cancer cells is significantly higher and that the chromatins are distributed uniformly within the entire nucleus.19 Given that the pH of the nucleus is sensitive to its chromatin content, the spectra of the nuclei of normal and cancer cells would be different when the samples are stained with a pH-sensitive stain. Relying on this recognized distinction between normal and cancer cells, researchers used hematoxylin–eosin (HE) to stain tissue samples and investigated the spectral properties of their nuclei.20,21,22 Yet, almost all relative studies focused on the spectral differences between the nuclei of the normal and the cancer cells; few of them concerned on the spectral features at difference points in a same nucleus. In this research, we take the spectral properties inside nucleus into investigation. To begin with, we found that there is a stable and significant difference in the transmission spectra of HE-stained normal and cancer cells. Secondly, we found that the chromatin distribution in cancer cells is more uniform than that in normal cells. Thus, the correlation between the spectra at different points in the nucleus of a cancer cell is higher than that in the case of a normal cell. In this study, we trained an SVM model on the transmission spectra of the nuclei of normal and cancer cells and used it to classify the cells. The correlation coefficients for the spectra at different points within the same nucleus were used to correct the classification results. Slides of hepatic carcinoma tissue were used to evaluate the classification system. The results revealed that the sensitivity and specificity of the proposed method for classifying cancer cells were as high as 99% and 98%, respectively. Furthermore, we also designed an image-processing method to automatically extract information from the nuclei for the identification process.

    2. System and Samples Used

    2.1. System

    As shown in Fig. 1(a), the main component of the hyperspectral microscopy system is an inverted microscope (ECLIPSE Ti-U, Nikon). A halogen lamp with a temperature-color-balancing daylight filter for 3200K is used as the light source and a liquid crystal tunable filter (LCTF) (VariSpec VIS, CRI Inc.) combined with a 16-bit complementary metal oxide semiconductor (CMOS) camera (ORCA-Flash 4.0 LT C11440-42U, HAMAMATSU) is used to capture the spectral images. When the broadband light from the source reaches the test sample, a portion of it is absorbed while the rest that passes through the sample is filtered by the LCTF and is subsequently imaged by the CMOS. Since the LCTF is a two dimension bandpass filter, a whisk broom (point-scanning) or push broom (line-scanning) instrument is not needed to acquire spatial information from the sample.

    Fig. 1.

    Fig. 1. HIS system and data that obtained using it. (a) Hyperspectral microscopy system. (b) Data cubes obtained using shown system. x and y represent spatial dimensions while λ represents spectral dimension. (c) Transmission spectrum at random pixel in image.

    A control program is used to coordinate the CMOS camera with the LCTF and allow wavelength scanning. By changing the voltage of the LCTF continually, one can obtain grayscale images at each wavelength over the entire spectral range of 420–720nm with a maximum spectral resolution of 2nm. These images are stored in a band-sequential format and are transformed into a spectral cube containing both spectral and spatial information (Fig. 1(b)). From these spectral cubes, we can extract the spectral information (Fig. 1(c)) at any point in the region of interest (ROI) of the sample.

    2.2. Samples

    Ten slides of hepatic carcinoma tissue samples were extracted from different patients and 10 slides of normal hepatic tissue samples were used in the study for the classification of normal/cancer cells using HSI. All the samples were provided by Wuhan Servicebio Technology Co., Ltd. The samples were processed as per the following steps: first, they were sequentially immersed in dimethylbenzene (20min), absolute ethyl alcohol (20min), 75% alcohol (5min), and deionized water (5min) to remove the paraffin that was used to fix them. Then, they were stained with hematoxylin (5min) and eosin (5min). Finally, they were dehydrated and fixed onto the slides. A quick microscopic examination confirmed that all the slides contained abundant normal and cancer cells in the field of view (FOV).

    3. Method and Results

    3.1. Acquisition of transmission spectra

    HE stains are sensitive to the chromatin content, and even a small change in the chromatin content can lead to variations in the transmission characteristics of the nucleus. Considering that there are significant differences in the chromatin contents of normal and cancer cells, their transmission spectra will also be different once the cells have been stained with HE. Therefore, the transmission spectra of the nuclei of cells stained with HE can be used to classify the cells as normal/cancer cells. In this study, we obtained the spectral data in the form of data cubes corresponding to wavelengths of 460–720nm and a spectral resolution of 2nm and extracted the transmission spectra of the nuclei for classification.

    To obtain the transmission spectrum corresponding to any given pixel, one needs to not only extract the spectral information of that pixel but also obtain the spectral information of the background (see Fig. 2(a)). The transmittance of a pixel at a certain wavelength is defined as follows :

    T(λ)=I(λ)sampleI(λ)background,(1)
    where T(λ) represents the transmittance of the pixel, which depends on the wavelength, λ. Because the grayscale level of the pixel is proportional to the light intensity, here I(λ)sample and I(λ)background are the grayscale values (intensities) of the target point (e.g., P1 and P2 in Fig. 2(a)) and the background point (e.g., background in Fig. 2(a)), respectively, at wavelength λ. By calculating the transmittance at each wavelength, we could obtain the transmission spectrum of any given pixel (Fig. 2(b)).

    Fig. 2.

    Fig. 2. Acquisition of transmittance spectrum. (a) Intensities of background point and target points as functions of wavelength. The inset depicts the target points (P1 and P2) and background point in image. (b) Transmittance spectra of target points.

    To collect the spectral data from the cancer cells for the adaptive algorithm with precision, we captured hyperspectral images of the slides using a relatively large FOV; this was accomplished by using a 20× infinity-corrected microscope objective while making sure that there was at least one cancer nest in the FOV. A cancer nest is a mass of cancer cells expanding from a common center, as shown as Fig. 3; the boundary between the normal tissue and the cancer nest can be seen clearly in the figure. This helped us to locate the cancer cells and extract the transmission spectra of their nuclei with precision. Moreover, all selected cancer cells were pre-identified by the pathological examination. The transmission spectra of the nuclei of normal cells were extracted from the slides of the normal tissue samples.

    Fig. 3.

    Fig. 3. Diagrammatic sketch of tissue with normal and cancer cells and cancer nest captured using HSI system.

    By imaging 100 normal cells and 100 cancer cells, the average transmission spectra of the HE-stained nuclei could be computed, as illustrated in Fig. 4(a). One can see that the standard deviation (SD) of the transmission spectra was relatively high; this was because of the significant differences between the individual cells. Though this deviation resulted in the overlapping of the transmission spectra of the normal and cancer cells, significant differences in the spectra can still be observed in the 570–660nm range.

    Fig. 4.

    Fig. 4. (a) Average transmission spectra of cancer and normal cells. (b) Points in nucleus of cancer cell (P1 and P2) and normal cell (P3 and P4) whose transmission spectra were measured. (c) Transmission spectra at P1 and P2 and their correlation coefficient. (d) Transmission spectra at P3 and P4 and their correlation coefficient.

    To take full advantage of the spectral and spatial information obtained using the HSI system, the transmission properties of different points within the same nucleus were investigated. We extracted the transmission spectra at two points within the same nucleus and calculated their correlation coefficient using the following formula :

    r(X,Y)=Cov(X,Y)Var[X]Var[Y],(2)
    where r presents the correlation coefficient of the transmission spectra and X and Y present the transmission spectra at the two points within the same nucleus.

    As shown in Fig. 4(b), we selected two points randomly in the nuclei of the normal and cancer cells while ensuring that their distance, d, was greater than half the minor axis of the nuclei, rmin. Next, we obtained their transmission spectra and calculated the correlation coefficients (Figs. 4(c) and 4(d)). Based on the calculation results for 500 normal cells and 500 cancer cells, we found that the correlation coefficients for the cancer cells were all higher than 0.994 while these for the normal cells were all lower than 0.958. This was because of the differences in the chromatin distributions within the nuclei of normal and cancer cells. Cancer cells often contain too much chromatin which is spread everywhere within the nucleus. This ensures that in the case of cancer cells, different points within the same nucleus exhibit similar transmission characteristics. On the other hand, the distribution of chromatin within the nuclei of normal cell is uneven which results in significant differences in the transmission spectra of different points within the same nucleus. This difference between normal and cancer cell was exploited in the proposed classification system.

    3.2. Model training and testing

    An SVM model was used to classify the normal and cancer cells based on the transmission spectra of their nuclei. The SVM is a kernel-based machine learning technique whose advantages include a strong theoretical foundation, high degree of generalization, and low sensitivity to a curse of dimensionality.23 As a supervised classification method, the SVM model requires a large number of standard samples for model training. In this study, 1000 nuclei of diagnosed cancer cells were selected from five slides of hepatic carcinoma tissue samples, and their transmission spectra at two random points were obtained. Similarly, the transmission spectra at two points in 1000 nuclei of diagnosed normal cells were also extracted from five slides of normal hepatic tissue samples. These data were used for training the model.

    Next, a dataset consisting of the transmission spectra of nuclei of 100 cancer cells and 100 normal cells was employed to evaluate the SVM model; this dataset was obtained from other five slides of hepatic carcinoma tissue samples and five slides of normal hepatic tissue samples and contained no training data. The obtained classification results are listed in Table 1 which shows the values of two key parameters: the sensitivity (SEN) and the specificity (SPEC). SEN is a measure of the proportion of positives identified correctly, while SPEC is a measure of the proportion of negatives identified correctly. They are defined by the formulas shown below12 :

    SEN=TPTP+FN,(3)
    SPEC=TNTN+FP,(4)
    where TP, TN, FN, and FP present the number of true positives, number of true negatives, number of false negatives, and number of false positives, respectively.

    According to Table 1, SEN for the cancer cells and SPEC for the normal cells were 99% and 98.85%, respectively, which were much higher than SPEC for the cancer cells and SEN for the normal cells. This is because many normal cells were wrongly identified as cancer cells. Considering that the chromatin distribution within the nuclei of normal cell is uneven, the chromatin content at several points within the nuclei of normal cells can be as high as that in the nuclei of cancer cells. This is what led to the inaccurate classifications.

    Table 1. Classification results obtained using SVM model.

    ActualCancer cellsNormal cellsSEN (%)
    Cancer cells99199%
    Normal cells148686%
    SPEC (%)87.61%98.85%

    Note: SEN: sensitivity and SPEC: specificity.

    To solve this problem, the differences in the chromatin distributions of normal and cancer cells were used to correct the model. This was done using the program shown in Fig. 5. To begin with, the program extracted the transmission spectrum at one point within the nucleus of a cell and classified that cell based on the SVM model. If this point was identified as belonging to a normal cell, the program ended. Otherwise, the program extracted the transmission spectrum at a different point within the same nucleus (d>rmin), calculated the correlation coefficient of the transmission spectra of the two points, and performed classification based on the value of the correlation coefficient. If the correlation coefficient was higher than 0.994, which was set based on the results for 500 samples, the cell was identified as a cancer cell, otherwise it was identified as a normal cell.

    Fig. 5.

    Fig. 5. Flowchart of program used to correct classification model.

    We used 100 cancer cells and 100 normal cells that were not included in the training dataset to evaluate the corrected classification system; the results are shown in Table 2. It can be seen that the classification accuracy improved significantly with this program. For instance, SEN and SPEC for the cancer cells increased to 99% and 98.01%, respectively.

    Table 2. Classification results obtained using SVM model corrected based on differences in chromatin distributions of normal and cancer cells.

    ActualCancer cellNormal cellSEN (%)
    Cancer cell99199%
    Normal cell29898%
    SPEC (%)98.01%98.99%

    Notes: SEN: sensitivity and SPEC: specificity.

    In 2008, Answer Siddiqi et al. presented an identification of normal, precancerous, and cancerous squamous cell via HSI.9 The sensitivity for cancerous squamous cell based on the spectral signature-based method was 98.6%. In 2012, Akbari et al. used an HSI system and the SVM model to identify the cancer cells in lung tissue and lymph node tissue.24 Without any correction for the SVM model, they obtained a sensitivity and specificity of 92.6% and 97.7% for lung tissue, and 96.2% and 98.3% for lymph node tissue, respectively. Comparing with these SVM-based classifications for HE stained cancer cells, our method showed a relatively high sensitivity and specificity. Moreover, the identification of hepatic carcinoma cells with our method was better than that with the convolutional neural network algorithm.25

    However, there are some limitations of this classification method. The threshold for the correlation coefficient was set based on the results for 500 samples. The precision would be higher if more samples were to be considered. In addition, only the transmission spectra at two points within the nucleus were used for the classification process. Hence, they would not be reflective of the entire nucleus.

    3.3. Automatic classification

    For practical data analysis, one must be able to classify all the cells in the FOV. This cannot be done manually. In this study, each spectral cube contained 130 single-band images with a resolution of 2048×2048 pixels. Hence, the total number of data points exceeded 545 million. Even if one were to scan all the points automatically, the computational cost would be very high. Since the proposed classification method only focuses on the nuclei, it does not need to identify all the points in the FOV. To reduce the computational cost and classify all the cells rapidly, a binarization algorithm was used to separate the nuclei from the background. The program is suitable only for classifying selected points within the nuclei and removes the background by assuming it to consist of noneffective points.

    The binarization process is based on the difference in the grayscale levels of the nuclei and the background: the greater the difference is, the more effective the binarization process will be. Therefore, we looked for an appropriate single-band image that exhibited the greatest difference in the grayscale levels of the nuclei and the interstitial areas. As can be seen from Fig. 6, the largest difference in the grayscale levels was observed when the wavelength was 580nm. Hence, the single-band image corresponding to the wavelength of 580nm was used for the binarization process.

    Fig. 6.

    Fig. 6. (a) Points in cancer cell (P1), normal cell (P2), interstitial area in cancer nest (P3), and interstitial area in normal tissue (P4). (b) Transmission spectra at P1, P2, P3, and P4.

    The original image for the wavelength of 580nm is shown in Fig. 7(a). Before the binarization process, we enhanced the contrast of the image to increase the difference between the nuclei and interstitial areas (Fig. 7(b)). Then, all the points in the image were classified into two types based on a certain threshold. If the greyscale value of a point was higher than the threshold, the point was labeled as 0; otherwise, it was labeled as 1. The post-binarization image is shown in Fig. 7(c). Next, we used the connected-region algorithm26 to label each nucleus. After that, the above-described program was used to select two points within each nucleus (Fig. 7(c)) and obtain their transmission spectra which were then used for classification using the SVM model and the differences in the chromatin distributions of normal and cancer cells. For more intuitive visualization, the classified pixels were filled with different pseudocolors. As shown in Fig. 7(d), the nuclei of the cancer cells are labeled as red while those of the normal cells were labeled as green.

    Fig. 7.

    Fig. 7. Automatic classification of normal and cancer cells. (a) Single-band image for wavelength of 580nm. (b) Single-band image for wavelength of 580nm after contrast enhancement. (c) Post-binarization image. Selected points in each nucleus are labeled in green and blue. (d) Pseudocolor image after classification. Nuclei of normal and cancer cells are labeled in green and red, respectively.

    The boundary between the cancer nest and normal tissue can be seen clearly in Fig. 7(d). Further, only a few nuclei were incorrectly identified. When the transmission spectra of only two points within the nucleus are used, classification is performed based on limited information, resulting in errors. However, most of the normal and cancer cells could be labeled automatically and precisely using the proposed classification method.

    4. Conclusion

    In this study, normal and cancer tissue samples were stained with HE, so that the spectral and spatial features of their nuclei could be determined in order to elucidate the differences in their chromatin content and distribution. The classification of hepatic carcinoma cells based on the spectral and spatial features of the nuclei could be performed successfully using the proposed method which exhibited high sensitivity (99%) and specificity (98%). Given that this classification method is based on the known differences between most types of normal and cancer cells, we believe that it has great potential for diagnosing various types of cancers.

    Conflict of Interest

    The authors declare that there are no conflicts of interest related to this article.

    Acknowledgments

    This paper was supported by the National Key Research and Development Program of China (2017YFB1104500), National Natural Science Foundation of China (61605062, 61735005 and 11704155), Science and Technology Planning Project of Guangdong Province (2018B030323017), Research Project of Scientific Research Cultivation and Innovation Fund of Jinan University (11617329), and Guangzhou Science and Technology Project (201903010042 and 201904010294).