World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Spectra selection methods: A novel optimization way for treating dynamic spectra and in-line near infrared modeling

    https://doi.org/10.1142/S1793545820500157Cited by:7 (Source: Crossref)

    Abstract

    Near infrared (NIR) spectroscopy is now widely used in fluidized bed granulation. However, there are still some demerits that should be overcome in practice. Valid spectra selection during modeling process is now a hard nut to crack. In this study, a novel NIR sensor and a cosine distance method were introduced to solve this problem in order to make the fluidized process into “visualization”. A NIR sensor was fixed on the side of the expansion chamber to acquire the NIR spectra. Then valid spectra were selected based on a cosine distance method to reduce the influence of dynamic disturbances. Finally, spectral pretreatment and wavelength selection methods were investigated to establish partial least squares (PLS) models to monitor the moisture content. The results showed that the root mean square error of prediction (RMSEP) was 0.124% for moisture content model, which was much lower than that without valid spectra selection treatment. All results demonstrated that with the help of valid spectra selection treatment, NIR sensor could be used for real-time determination of critical quality attributes (CQAs) more accurately. It makes the manufacturing easier to understand than the process parameter control.

    1. Introduction

    Fluidized bed granulation has been widely applied in the manufacturing of solid dosage forms in recent years.1,2 Because it could make the final product have better flowability, homogeneity and less chances of cross-contamination during manufacture.3,4 However, fluidized bed granulation is a sealed process in which the fluidization state of the materials is drastic and invisible. It is hard to know the physicochemical attributes of the materials. Only at the end of the process could the analyst test the quality of the products to decide whether it is qualified or not. Therefore, it is difficult to understand the manufacturing process in fluidized bed granulation timely.

    Process analytical technology (PAT) was proposed by U.S. Food and Drug Administration (FDA) in 2004. The main idea was to utilize all kinds of process analytical tools to measure the critical quality attributes (CQAs) with the aim of adjusting the process parameters dynamically to guarantee the quality of final product.5,6,7 In state of the art, many process analytical tools have been employed to try to make the process visualized such as imaging method,8 microwave resonance (MR) method,9 spatial filtering velocimetry (SFV) method,10 optical coherence tomography (OCT) method,11 and acoustic emissions (AE) method.12 Near infrared spectroscopy (NIRS) has been one of widely used PAT tools for process monitoring in many fields such as agriculture, food, soil, medical diagnostics, and pharmaceuticals.13,14,15,16,17,18 And researchers have done many efforts for NIRS application to fluidized bed. Markovic and co-workers established partial least squares (PLS) models with diffuse reflectance spectra to predict the residual solvent content, pellets size sieve fractions and amount of coating layer.19 NIRS was also applied in coating process by Hudovornik et al.20 Two CQAs, namely moisture content and coating thickness, were predicted in-line accurately. Otsuka et al. used NIRS to predict water content and D5050 of acetaminophen formulation granules during mixing, granulation and drying processes in a laboratory-sized fluid-bed granulator.1 The successful application put forward higher requirements for NIRS than ever before.

    With the application of NIR technology in practical manufacture increasing, dynamic spectra selection becomes an important issue. In the practical applications of fluidized bed granulation, there is a particular challenge for the stable spectroscopic measurement due to the very dynamic nature of the sample (i.e., granules), such as changes in working distance, measurement angle, and granule physical properties (e.g., density).21,22,23 This dynamic nature makes in-line spectra sometimes not stable, which contains some information related to air, vibration and material density, etc. Several efforts have been done to improve the performance of NIR monitoring for dynamic processes. Averaging method was used to optimize the in-line NIR model by Lee and co-workers.24 This method was proposed based on the theory that the influence of dynamic disturbances can be greatly reduced by the averaging and clustering scheme.25 In Manel’s study, the spectra corresponding to each sampling time were recorded and averaged as the corresponding standard spectrum to reduce the influence of abnormal spectra.26 Margot and co-workers identified abnormal spectra by limiting the absorbance of the spectra to a specific region. In-line spectra were rejected if their absorbance was outside the defined range.27 These above methods reduce the influence of interference information to some extent, but abnormal spectra were not eliminated specifically. Spectral selection is still very necessary.

    Therefore, in this study, a novel micro and smart NIR spectrometer was introduced to realize the real-time monitoring of fluidized bed granulation. With the help of chemometrics, five spectra selection methods were investigated and applied to real-time NIR spectra to improve the predictive capability of NIRS model combined with pretreatment and variable selection methods. This research may provide a new way for treating dynamic spectra and construct a PLS model with higher accuracy.

    2. Materials and Methods

    2.1. Materials

    2.1.1. Samples

    The active pharmaceutical ingredient (API), acetaminophen, was purchased from Shandong Changda Biotechnology Co., Ltd. Corn Starch was provided by Shandong Liaocheng Huayang Pharmaceutical Auxiliary Material Co., Ltd. Microcrystalline cellulose and the binder, hydroxypropyl methyl cellulose, were obtained from Anhui Shanhe Pharmaceutical Excipient Co., Ltd. Lactose was supplied by Zhenjiang Fukang Bioengineering Co., Ltd.

    2.1.2. Software

    NIR spectra were processed with Matlab 2016b (Mathworkers Inc., USA) and Unscramber 10.1 (CAMO, Norway) softwares. Micro NIR Pro v2.3 (VIAVI, USA) was used for in-line spectra collecting and monitoring.

    2.2. Granulation process

    All batches used herein were manufactured in a lab-scale fluidized bed. (LGL 002, Shandong SMA Pharmatech co., Ltd Co., Ltd.). The standard formulation used in this study was as follows. Acetaminophen (100g, 5% wwww), corn starch (500g, 26%ww)ww), microcrystalline cellulose (740g, 38% wwww), lactose (600g, 31%ww)ww), hydroxypropyl methyl cellulose (3% wv)wv).

    Five batches including 75 in-line dynamic calibration samples were repeated under exactly the same conditions to develop quantitative NIR model. In addition, two separate granulating operations were carried out to validate our calibration model and then another two test sets were performed to evaluate the in-line monitoring performance of calibration model. The equipment was initially preheated for 15min to 35C. The whole process was about 60min. Each granulation process with the experimental conditions was given below. The inlet air temperature was 50–60C. The atomization pressure was 0.4 bar. The spray rate was 14mL/min and the exhaust fan opening ratio was 50%. The powder was agitated and mixed in the expansion chamber for 5min, and then binder solutions were sprayed into the powder mixture for 40min, and then the granules was dried for 15min.

    2.3. Spectra acquisition

    The raw spectra were collected by a micro and portable NIR spectrometer (Micro NIR PAT U Spectrometer, VIAVI, USA) directly with diffuse reflectance module as shown in Fig. 1. In order to avoid the influence of temperature on the spectrometer, PAT-U spectrometer was connected to a heat resisting metal probe. Then the probe was fixed on the side of the expansion chamber which was 15cm high from the bottom screen and perpendicular to sampling port. Detailed information is shown in Fig. 1.

    Fig. 1.

    Fig. 1. (a) The system of the fluid bed set-up used for granulating. (b) denotes the position of the NIR probe.

    The NIR spectra were acquired in-line automatically every 2s, ranging from 908.1nm to 1676.0nm, with an average of 100 scans. During the granulation and drying process, about 6g samples were thieved by a glass-sampling bottle every 4min for reference analysis.

    2.4. Determination of moisture

    The thieved samples were used for moisture content determination by loss on drying (LOD) method.28

    2.5. Spectra selection

    In this study, every NIR spectra was acquired automatically every 2s. The whole process was about 60min. In total, the calibration spectral matrix collected during each manufacturing batch was 1800 by 125, with 1800 representing the number of spectra obtained and 125 representing the number of wavelength channels of the NIR spectrometer. And the spectra corresponding to each sampling time were recorded for spectra selection investigation as followed.

    2.5.1. Principal component analysis method

    Principal component analysis (PCA) was applied to the spectra of each time point, a cluster of spectra corresponding to each sample at each time point (e.g., Fig. 3(a)), and spectra shown in the confidence (95%) ellipse according to the PCA scores plot were averaged as the corresponding standard spectrum. The spectra outside the 95% confidence line were identified as abnormal spectra.

    2.5.2. Influence plot method

    FF-residuals and Hotelling’s T2T2 statistics were introduced to identify the abnormal spectra. The confidence line of Hotelling’s T2T2 distribution was set at 95% and the αα-error of FF-test was controlled in 0.05. Samples both outside the FF-residuals and Hotelling’s T2T2 threshold values were identified as abnormal spectra.

    2.5.3. Baseline variation method

    A wavelength, which could represent the baseline information, was chosen according to calculate the standard deviation (SD) of each wavelength for the whole spectra. After that ±±1 fold, SD was selected as the threshold to eliminate the abnormal spectra.

    2.5.4. Standard deviation of spectra

    All spectra at each point were set as a matrix (XX), and the SD of XX matrix was obtained as STD. One of the spectra in matrix (XX) was stripped each time and then the SD of the remaining spectra was calculated as S. If S<<STD, the one spectrum removed was considered as abnormal spectrum, and so on until all spectra were processed.

    2.5.5. Cosine distance method

    The spectral similarity was calculated to eliminate the abnormal spectra, which was based on the cosine distance29,30 between each spectrum and mean spectrum. The characteristic spectral region (1390–1420nm) was selected for calculation of similarity. The specific calculation process was as follows. Each spectrum was treated as a multi-dimensional vector. Cosine ratio between each spectrum in XX matrix and mean spectra was calculated as MM. Spectra were rejected if their cosine distance above the threshold of 1 fold SD.

    2.6. PLS model

    PLS models were built for moisture content and spectra outlier was identified first by PCA. Then different pretreatment methods were investigated to eliminate the irrelevant information. In order to further improve the model’s predictive capacity, wavelength selection methods including correlation coefficient (CC)31 method, competitive adaptive reweighted sampling (CARS)32 and reverse interval partial least squares (RiPLS)33 method were all introduced and compared. The models were evaluated by root mean square error of cross validation (RMSECV), root mean square error of calibration (RMSEC) and determination coefficient of calibration (R2cR2c). Finally, two external independent batch samples were used to validate the predictive ability of models in terms of root mean square error of prediction (RMSEP). Also, the established models were applied to monitor the process in-line. The equations used in this section were as follows :

    R2c=1mi=1(yk,actualyk,predicted)2mi=1(yk,actualȳk,actual)2,(1)
    RMSEC=ni=1(yk,actualyk,predicted)2n1,(2)
    RMSEP=ni=1(yi,actualyi,predicted)2n1,(3)
    RMSECV=ni=1(yi,actualyi,cv)2n1.(4)
    In the four equations above, yk,actual is the measured value by reference method of the sample in calibration set, and yk,predicted is the calculated value predicted by NIR spectroscopy method of sample in calibration set. yi,actual is the measured value by reference method of the sample in validation set, and yi,predicted is the calculated value predicted by NIR spectroscopy method of sample in validation set. yi,cv represents the estimated value of the ith sample in calibration set. M and n denotes the number of samples of calibration set and validation set, respectively.

    3. Results and Discussion

    3.1. NIR spectral analysis

    First, five batches were prepared and used for moisture content calibration model study during the granulation and drying process. The raw spectra collected were shown in Fig. 2(a). Two main peaks around 1200nm and 1500nm could be found. The weaker peak around 1200nm could be attributed to second overtones of CH stretches.34,35 The strong peak around 1500nm was assign to the first overtone of OH stretch.36 Though the OH stretch band provided valuable chemical information for moisture content determination, chemometric methods should be introduced to dig up the hidden information from the overlapped spectral bands.

    Fig. 2.

    Fig. 2. (a) The raw spectra of the five granulation batches to establish the moisture content calibration model. (b) The raw spectra to establish the moisture content calibration model and the raw spectra of two granulation batches to validate the calibration model.

    Another two-granulation batch was manufactured for validation study and the raw spectra were shown in Fig. 2(b). From the raw spectra, the validation NIR spectra at most wavelengths lied within the calibration NIR spectra, which indicated that the variabilities related to moisture exhibited in the validation were included in the calibration.

    3.2. Spectral selection

    The dynamic nature of the granules in expansion chamber makes some spectra reveal the information unrelated to the moisture of the material which should be avoided. To reduce the influence of dynamic disturbances, obtain a relatively accurate mean spectrum, and improve the predictive capability of NIRS model, spectra selection is very necessary. Five spectral selection methods were investigated and the results were shown in Fig. 3. Taking the sampling point corresponding to 10min in the granulation process as an example, the thieved sample at this point had been fine granules instead of powder.

    Fig. 3.

    Fig. 3. (a)–(e) Results of different spectral selection methods for the sampling point corresponding to 10 min in the granulation process (f) RMSEP values of NIR models with different spectral selection methods.

    PCA result in Fig. 3(a) indicated that two spectra were recognized as abnormal spectra which outside the 95% confidence. PCA was able to successfully remove certain samples (3, 6) far from the center. The removal of the outliers could avoid obviously deviation of mean spectrum, which reduced the impact of interference factors.

    While using influence plot, three abnormal spectra were identified as shown in Fig. 3(b). The influence plot shows the F-residuals versus Hotelling’s T2 statistics, which represent two different kinds of outlier detection. As shown in Fig. 3(b), spectra were identified as abnormal points if their positions outside the red confidence line. Such spectra were “dangerous” in the calibration due to their great influence to the model, and then they would make the model become unreliable.

    Figure 3(c) showed the baseline variation method results. This method eliminated abnormal spectra caused by irregular baseline drift. Six abnormal spectra were identified with this method. Compared with the study of Margot and co-workers,27 the method proposed here was more reasonable because the selection of baseline information and the threshold of absorbance were statistically significant.

    Standard deviation method results were shown in Fig. 3(d), and six abnormal spectra were observed. The approach to determinate the homogeneity of spectra was similar to the calculation of the standard deviation of a moving block spectral standard deviation.

    Cosine distance method was also introduced to solve this problem, and the results were shown in Fig. 3(e). Six abnormal spectra were found by this method. Cosine distance was used to measure the similarity in the direction of two numerical vectors and widely used in similarity evaluation of chromatographic fingerprints. Each spectrum was treated as a multi-dimensional vector. This method removes those spectra with different vector directions due to interference factors.

    After that, all samples were treated following the same procedure, and PLS models were established to evaluate these spectra selection methods. The results were shown in Fig. 3(f). It could be found that the predictive capability of all models with spectral selection was improved to some degree compared with raw data. Compared to other spectral selection methods, the results of PCA method and influence plot method seemed to be unsatisfactory owing to the failure to completely eliminate the abnormal spectrum probably. However, the remaining three methods can significantly reduce the RMSEP value and improve the quality of the model due to successful elimination of anomalous spectra caused by interference factors. The cosine distance method obtained the minimum RMSEP value (improved by 15.4%) compared with raw spectra without spectra selection treatment.

    3.3. Reference data analysis

    Seventy-five thieved samples from five different batches were collected in all. The moisture contents were determined by LOD method as reference. The results were shown in Fig. 4. The whole process consisted of two stages. During the granulation stage (i.e., 6–45min), the moisture content changed slightly. While for the drying stage (i.e., 46–60min), the moisture content decreased significantly. The whole process came to the end-point at about 60min when the moisture content was below 2.5%. The results of moisture in all batches manifested that the variance among different batches changed greatly, which indicated the necessary for in-line monitoring. The purpose of in-line monitoring is to control the CQA such as moisture to a reasonable range, and then to avoid bed collapse or granulation failure.

    Fig. 4.

    Fig. 4. The trend of the reference data in the granulation process corresponds to the green part of the figure, and the trend of the reference data in the drying process corresponds to the red part of the figure.

    3.4. Establishment and evaluation of moisture content model

    3.4.1. Identification of outliers

    Principal component analysis (PCA) was a classical method for outlier detection by selecting the associated critical limit and deducing the confidence ellipse. In the scores plot, the closer positions the samples were, the more similar their spectra were. Generally, samples were identified as abnormal ones if their positions outside the confidence (95%) ellipse according to the PCA scores plot. PCA was able to successfully remove certain samples different from the calibration set. The removal of the outliers could avoid obviously deviation of predictive values, which improved the accuracy and robustness of the model. In this study, after spectral selection, outliers were detected using PCA method by calculating the leverage values (see Fig. 5). Figure 5 showed the PCA plot based on the first two principal components. Samples 4, 5 and 69, were identified as abnormal points due to their positions outside the confidence (95%) ellipse according to the PCA scores plot.

    Fig. 5.

    Fig. 5. Score plot of PCA for outlier detection. The ellipse is corresponding to 95% confidence.

    3.4.2. Spectral pretreatment

    In order to minimize or eliminate the distorting effect of the light scattering and unwanted systemic variations due to baseline shifts and stochastic noise, different pretreatments were adopted in this study.32 The raw spectra to establish the model of moisture content were transformed by several preprocessing methods, such as Savitzky–Golay (SG) smoothing, mean center, SNV, MSC, first-order derivative (FD), and second-order derivative (SD). Then PLS models were constructed to evaluate the preprocessing methods. During this procedure, leaving one out cross validation (LOOCV) was used to select the latent variables (LV). The model parameters were summarized in Table 1. As shown in Table 1, the pretreatment combination of SNV, FD, SG smoothing (width 7, polynomial order 2) and mean center applied to the spectra had the best predictive capacity. The values of RMSEC, RMSECV, R2c, and R2cv were 0.1562%, 0.1731%, 0.965 and 0.957, respectively.

    Table 1. Results of different spectral pretreatment methods for in-line NIR quantitative model.

    Pre-methodRMSEC (%)RMECV (%)R2cR2cvLVs
    SNV0.16010.18220.9660.9607
    MSC + Mean center0.16060.19730.9630.9467
    SNV + Mean center0.16650.20130.9600.9456
    FD + SG7 + Mean center0.15040.18150.9680.9565
    SD + SG3 + Mean center0.16710.18780.9600.9435
    MSC + FD + SG7 + Mean center0.15570.17360.9660.9594
    SNV + FD + SG7 + Mean center0.15620.17310.9650.9574

    3.4.3. Wavelength selection

    CC and RiPLS methods were applied to select the wavelengths to improve the accuracy and robustness of the models. During the selection process, RMSECV was used to evaluate PLS models. The results of the PLS models with different wavelength selection methods were shown in Table 2. It was found that the RiPLS (interval was 25) showed the optimal result within the range of 1063–1676nm, which could be attributed to the first O–H overtone and the deformation of the O–H group in water.

    Table 2. The results of PLS models with different wavelength selection methods.

    Wavelength selectionRMSEC (%)RMECV (%)R2cR2cvWavelength (nm)LVs
    RiPLS100.15780.17800.9650.957908–1274; 1342–16764
    RiPLS150.14190.16750.9710.960908–1366; 1466–16766
    RiPLS200.15100.16800.9680.9601032–16764
    RiPLS250.15100.16710.9680.9591063–16764
    RiPLS300.15140.17270.9680.958908–1274; 1466–16766
    CC-0.150.15560.17030.9660.958908–1453; 1521–16764
    CC-0.200.15570.17240.9660.960908–1416; 1614–16764
    CC-0.250.17150.18770.9580.955908–1404; 1658–16764
    CC-0.300.15020.17270.9680.958908–1391; 1670–16765
    CC-0.350.15040.17160.9680.960908–13855

    Because the light absorbed at the end of the PAT-U spectrometer is not strong enough, considering the influence of the variables near the end of 1700nm on the PLS model robustness, the wavelength range (1663–1676nm) was eliminated manually. After eliminating different variables, the model results were shown in Table 3. The wavelength range of 1032–1658nm was selected to build the model and the parameters RMSEC, RMSECV, R2c and R2cv were 0.1431%, 0.1584%, 0.971 and 0.958, respectively.

    Table 3. The results of PLS models after eliminating different variables.

    Variables rangeRMSEC (%)RMECV (%)R2cR2cvLVs
    1032–1651nm0.15610.17190.9660.9543
    1032–1658nm0.14310.15840.9710.9664
    1032–1664nm0.14300.15840.9710.9584
    1032–1670nm0.14290.16000.9710.9664

    3.4.4. Validation of the calibration models

    An external dataset (28 samples) was used to validate the established PLS model, and the result was shown in Fig. 6. It could be concluded that the R2p, RMSEP were 0.907 and 0.123%, respectively, which demonstrated the good predictively of the PLS model. Paired t-test was performed further on NIR predicted data and reference values. The results of the paired t-test were shown in Table 4. It was found that there was no significant difference between the results obtained by the two methods, demonstrating the feasibility of NIRS for moisture content in-line monitoring.

    Table 4. The results of paired t-test.

    t test (0.05)
    MethodsSamplesMean (%)SD (%)HP
    LOD282.430.4100.532
    NIRS282.410.45
    Fig. 6.

    Fig. 6. The validation for the models of moisture content.

    3.5. In-line monitoring of moisture content

    Moisture content of the granules were monitored in real-time with the corresponding in-line model. The Micro NIR Pro v2.3 imported PLS models was used to monitor the two external batches. The thieved samples (about 6g) were collected during the granulation and drying process was made every 4 min in order to assess the accuracy of real-time monitoring results. The model’s prediction capacity of moisture content was assessed by comparison of the values of the referenced method with the in-line NIRS. The results were presented in Fig. 7. Good correlation between the two methods was observed. The blue points at the end of each plot changed dramatically because at this time, the manufacturing process stopped while the NIR sensor did not stop collecting data. The novel NIR sensor showed great potential for in-line monitoring.

    Fig. 7.

    Fig. 7. Monitoring results of in-line quantitative model of water content.

    4. Conclusions

    A novel micro NIR sensor was employed to demonstrate the feasibility for process monitoring of moisture content. Different spectral selection methods were provided and investigated to improve the predictive capacity. The cosine distance method was a universal tool for spectra selection. In addition, spectra pre-treatment and variable selection methods were also important for model predictive ability improvement. It was found that there was no significant difference between NIR method and reference method according to paired t test. This study can be used as a reference for in-line monitoring of granulation in pilot-scale and the industrial fluidized bed with the portable micro NIR sensor. This demonstrated that NIRS could have a potential to take the place of traditional measurement methods for monitoring the granulation process in real-time. Furthermore, the CQA could be connected with the critical process parameters to realize intelligent production in solid preparations.

    Conflicts of Interest

    The authors declare that they have no conflict of interest.

    Acknowledgments

    The authors are grateful for the financial support of the Natural Science Foundation of Shandong Province of China (No. ZR2017MB012), Major Innovation Project of Shandong Province of China (2018CXGC1405) and the supply of the fluidized bed of SMA Pharmatech Co., Ltd. (Zibo, China).