Development of calibration models for rapid determination of moisture content in rubber sheets using portable near-infrared spectrometers
Abstract
Rubber sheets are one of the primary products of natural rubber and are the main raw material in various rubber industries. The quality of a rubber sheet can be visually examined by holding it against clear light to inspect for any specks and impurities inside, but its moisture content is difficult to evaluate based on a visual inspection and this might lead to unfair trading. Herein, we developed a rapid, robust and nondestructive near-infrared spectroscopy (NIRS)-based method for moisture content determination in rubber sheets. A set of 300 rubber sheets were divided into a calibration (200 samples) and prediction groups (100 samples). The calibration set was used to develop NIRS calibration equation using different calibration models, Partial Least Square Regression (PLSR), Least Square Support Vector Machine (LS-SVM) and Artificial Neural Network (ANN). Among the models investigated, the ANN model with the first derivative of spectral preprocessing presented the best prediction with a coefficient of determination (R2P)R2P) of 0.993, root mean square error of calibration (RMSEC) of 0.126% and root mean square error of prediction (RMSEP) of 0.179%. The results indicated that the proposed NIRS-ANN model will be able to reduce human error and provide a highly accurate estimate of the moisture content in a rubber sheet compared to traditional wet chemistry estimation methods according to AOAC standards.
1. Introduction
Natural rubber is an important economic crop in Southeast Asia. Natural rubber is used extensively by many manufacturing companies in the rubber industry. Applications include tires, tank liners and automobile parts. The natural rubber is often collected from smallholders in various forms and separated into specific grades based on a visual inspection. One form of early processing is called a rubber sheet, which can be of many types including ribbed smoked sheet (RSS), unsmoked sheet (USS) and air-dried sheet (ADS) depending on the drying methodology used. Among them, the USS has been more popular with growers in Thailand. Practically, the entire volume of this grade of rubber is produced by small-scale and medium-scale rubber growers, scattered throughout the rubber growing districts in the country. The popularity of this grade is mainly due to the simplicity and low cost of the processing machinery and the easily adoptable processing technology in the manufacturing process for any amount of latex. The rubber sheets are generally graded according to their color, moisture content, and consistency as well as observed impurities.
Near-infrared spectroscopy (NIRS) is a type of high-energy vibrational spectroscopy performed in the wavelength range 750–2500nm (13,333 to 4000cm−1)−1), which is the region between visible light and classical mid-infrared.1 NIRS is a fast and nondestructive analytical method. It has proven its effectiveness for both qualitative and quantitative analyses in several fields. NIRS was first used in agricultural applications to measure moisture in grain.2 Since then, it has been used for rapid analysis of moisture, protein and fat contents of a wide variety of agricultural and food products.3 Recent studies involving the NIR region have shown that NIRS is a suitable method for quantifying trace amounts of moisture in a rubber sheet, due to the strong combination of absorption bands for water at around 1940nm and the first, second and third overtones at 1450, 970 and 760nm, respectively.4 Another paper investigated the prediction of the dry rubber content in concentrated latex by using a portable NIRS (Avantes, the Netherlands) in the wavelength range 370–1085nm.5 The results of the prediction showed that NIRS predicted accurately with a high coefficient of determination (R2=0.9741R2=0.9741) and root mean square error of calibration (RMSEC=1.09RMSEC=1.09%). In 2015, the use was proposed of the NIR System 6500 for determining the moisture content of natural rubber in the form of cup lump rubber using the wavelength range 400–1100nm.6 Partial Least Square Regression (PLSR) was selected to develop the calibration model, with the resultant model from experimentation having good statistical results (R2=0.98R2=0.98, RMSEC=1.68RMSEC=1.68% and RMSEP=1.48RMSEP=1.48%). However, the technique has not been implemented for field use despite this previous research showing that the NIRS technique was suitable for determining the dry rubber and moisture contents of natural rubber.
Interest in NIRS has increased due to the emergence of new mathematical approaches such as artificial neural network (ANN) models as well as the development of fiber optics that allows delocalization of the measurement. In recent years, the use of ANN models has spread to various applications in different fields. In particular, in the field of NIRS, ANN models have been shown to perform well with regard to nonlinear models, resulting in clear improvements in the models developed in a large number of applications.7,8 Therefore, the objective of this paper was to evaluate the potential of three different types of calibration models (Partial Least Squares Regression (PLSR), Least Square support vector machine (LS-SVM), and ANN) using NIR spectroscopy in the wavelength range 900–1700nm for the prediction of the moisture content in a rubber sheet.
2. Materials and Methods
2.1. Rubber sheet process
The rubber sheets were sampled from a rubber plantation in eastern Thailand. They were prepared from latex by diluting the latex with clean and pure water prior to adding acid to promote coagulation. This dilution helped in achieving quality consistency in the final product. Latex coagulation was promoted by adding 1% diluted formic acid to the already diluted latex as developed by the Rubber Research Institute of Thailand. The addition of acid in diluted form assisted in achieving uniform acid distribution in the latex and thereby ensured complete coagulation and a soft coagulum to produce sheets free of air bubbles and stickiness. Water was squeezed out of the rubber coagulum using a series of rollers. This process was continued by pressing down on the marking roller to thin down the coagulum to form a sheet with even thickness (3mm). The rubber sheets were then hung up to allow the remaining water to drip off for about 1–2 days and to remove the surface moisture. Drying was completed after the rubber sheets had been continually dried in a chamber for about 10–15 days.
2.2. Portable NIRS design
The portable NIRS is designed for use in reflectance mode in the range 900–1700nm with 3.5nm resolution, as shown in Fig. 1. Its dimensions are 6.5×17.0×4.56.5×17.0×4.5cm. Inside, it contains the DLP NIR Scan Nano EVM (evaluation module) from Texas Instruments, Inc., USA and a light source for generating the near-infrared light. The portable NIRS is powered by a battery (Li-Polymer 3.7V 1800mAh) and is connected to a computer via a USB port or a smartphone using Bluetooth Low Energy (BLE).

Fig. 1. Components of portable NIRS.
In this paper, the spectral data of rubber sheet were collected using the DLP NIR Scan_Nano_GUI software. Each measurement point was based on the average of 10 individual readings. The spectral data were then transferred to the MATLAB software package for spectral preprocessing and multivariate analysis.
2.3. Spectrum acquisition and correction
In the experiment, a rubber sheet was placed on an aluminum plate. The rubber sheet spectrum was acquired using the portable NIRS in reflectance mode. Due to the nonsmooth surface of the rubber sheet, the reflected light traveled different distances in different directions from the sample surface to the spectrometer detector, resulting in different spectral results from different locations. There are other factors affecting the spectral characteristics that may not be easily defined and these can cause a spectral shift in terms of linear and nonlinear translation and thus influence the performance of the calibration model. Consequently, spectral pretreatment is a necessary part of the spectral analysis and can improve the accuracy of the analysis results. For this reason, the spectral transformations were applied to the rubber sheet spectra to reduce the problems associated with noise, light-scattering and external effects prior to implementing the regression analysis. Presently, there are several chemometric pretreatment methods available such as moving average smoothing (MAS), multiplicative scatter correction (MSC), standard normal variate transformation (SNV) and first and second derivative transformation9 that could be applied to reduce the noise and normalize the spectra.10 In this paper, SNV and the first and second derivatives were selected to compare their effects on the proposed calibration models.
2.4. Calibration models
2.4.1. Partial Least Squares Regression (PLSR)
The PLSR method is the most commonly used regression algorithm in the field of chemometric spectroscopy.11 In algorithm, PLSR searches for a set of components or latent variables (LV) that performs a simultaneous decomposition of spectral data (X)X) and reference value (Y)Y) as a product of a common set of orthogonal factors and a set of specific loadings with the constraint that these components explain as much as possible of the covariance between XX and YY. Using more LV in the model causes over-fitting resulting in low model performance. Therefore, the optimal number of the latent variable is selected to avoid over-fitting but to maximize the covariance between the XX and YY space.12 In this paper, the optimal latent variable was discovered using a cross-validation technique.13 The latent variable that results in the minimum of the mean square error (MSE) was considered as the optimal latent variable of the model. PLSR is a good alternative to classical multiple linear regression and principal component regression methods because it is more robust in that the model parameters do not change very much when new calibration samples are taken from the total population. In this paper, the PLSR algorithm in MATLAB R2017a (The Math Works, Natick, Ma, USA) was used to build the calibration models for moisture content prediction in rubber sheets.
2.4.2. Least Square Support Vector Machine (LS-SVM)
The support vector machine (SVM) was initially developed for classification problems and was then expanded to treat regression problems. An SVM is a supervised classification and regression method capable of dealing well with both linear and nonlinear data.14 SVM has important roles in the classification task regarding pattern recognition and machine learning14 and has proven to be a reliable and efficient method in NIR spectroscopy.15,16,17 LS-SVM is an alternate formulation of SVM that simplifies the training process of SVM and significantly reduces the computation time while maintaining similar performance to SVM.18 The implementation of SVM requires three tuning parameters (γγ, σσ, ϵ)ϵ), while LS-SVM requires only two parameters (γγ, σ)σ).19 In this paper, LS-SVM was implemented to establish a calibration model to correlate the mean spectra and the moisture content of the rubber sheets. In this model, the radial basis function (RBF) as shown in Eq. (1) was selected as the kernel function. A grid-search technique using LS-SVM lab toolbox was used to tune the regularization parameter (γ)γ) that determines the trade-off between the training error minimization and smoothness20 and the RBF kernel function parameter (σ2)σ2) which is the squared bandwidth of the Gaussian curve.20
2.4.3. Artificial neural network (ANN)
An ANN is a set of algorithms and is similar to synaptic strength in biological neurons.21 It consists of interconnected neurons between the input, hidden and output layers. A node in each layer combines input from the data with a set of weights. An ANN has high processing speed, robustness, and generalization capabilities and is able to deal with large dimensional data spaces.22 In particular, a feed-forward back-propagation network is capable of distinguishing interesting features from voluminous and noisy datasets having distorted patterns.23 This paper proposed a multilayer feed-forward neural network with one input layer, one hidden layer, and one output layer topology, as shown in Fig. 2. The network had a fixed number of inputs relative to the number of spectral wavelengths in this layer. The neurons of one layer were connected with each neuron of the previous layer. This connection had a feed-forward; no backward connection was allowed. At the neuron level, a bias was added to the weighted sum of the inputs and the Tan-sigmoid transfer function was applied. The output of a single neuron was calculated using Eq. 2 :

Fig. 2. A neural network with feed-forward architecture and one hidden layer.
where mm is the number of inputs, ii is the number of the current neuron in layer jj and wjk,iwjk,i is the synaptic weight factor for the connection of the neuron ijij with the neuron kj−1kj−1. The training occurs in a supervised style. The basic idea is to present the input vector to the network and to calculate in the forward direction the output of each layer and the final output of the network. In the output layer, the desired values are known and therefore the weights can be adjusted using the Levenberg–Marquardt (LM) algorithm according to the gradient descent rule. The LM algorithm is one of the most powerful and rapid methods used in the training of feed-forward multilayer networks24 being a second-order optimization method based on the determination of the Jacobian matrices corresponding to the partial derivatives of the cost function and descends through the error surface by also using the information provided by the changing rhythm of the slope. The LM algorithm is an effective modification of the Gauss–Newton method, which combines the excellent local convergence properties of that method with the consistent error decrease provided by the gradient descent method.25
2.5. Model evaluation
The predictive performances of PLSR, LS-SVM, and ANN were compared using various parameters. The coefficient of determination defined in Eq. (3) gives information about the goodness of fit of a model in calibration (R2C)R2C) and prediction (R2P)R2P). The root mean square errors (RMSE) of calibration (RMSEC) and prediction (RMSEP) as shown in Eq. (4) represent the mean absolute error of the time-series that calculated by the model between the reference yiyi and predicted ŷi value.26 The residual predictive deviation of calibration (RPDC) and prediction (RPDP) were calculated by dividing the standard deviation (SD) of the reference values by RMSEC or RMSEP, respectively27 and are described by Eq. (5). In general, a good model should yield high values of RPDC, RPDP, R2C, and R2P, while also producing low values of RMSEC and RMSEP, as well as only a small difference between them.
3. Results and Discussion
3.1. Sample preparation
In total, 300 rubber sheets were divided into two groups of 200 samples and 100 samples. The first group (calibration set) was used for developing the calibration model and the second group (prediction set) was used for testing the model. The spectrum data of the samples were collected using a portable NIRS. Then, the samples were weighed for initial mass and the moisture content was calculated using the gravimetric method and hot-air oven drying defined by AOAC standards.28 Table 1 shows the rubber sheet moisture content statistics such as the range and mean for both the calibration and prediction sets. These values indicated that the sample range covered the moisture content of rubber sheets available in the market and was almost the same for calibration and prediction. This indicated good conditions for model testing because the moisture content of the prediction set was within the range of the calibration set.
Sample set | No. of samples | Moisture content (%db) | Average±Standard deviation |
---|---|---|---|
Calibration set | 200 | 0.24–4.87 | 2.06±1.48 |
Prediction set | 100 | 0.24–4.81 | 2.00±1.50 |
Finally, in the formulation of the rubber sheet, a pattern was printed on its surface using an engraved roller that roughened the surface. Thus, the spectra acquired from different locations (Fig. 3(a)) on the rubber sheet surface were not exactly the same due to the light scattering associated with each location. For the reasons above, if only one spectrum was acquired from the sample, the calibration model may result in poor performance. Therefore, the experiment was conducted to acquire four spectra at different locations on the rubber sheet surface. These spectra were processed using SNV (Fig. 3(b)) to reduce the light scattering effect at different locations. Then, the processed spectra were averaged to obtain a mean spectrum of the sample and all samples (both calibration and prediction sets) were treated in the same way. Finally, the mean spectrum of all samples was preprocessed using different techniques (SNV, the first derivative or the second derivative) before testing them using the model algorithm to compare their effects on model performance.

Fig. 3. SNV spectral processing based on different locations.
3.2. Spectra characteristics of rubber sheets
The mean absorbance spectra of rubber sheets in the region 900–1700nm with different moisture contents (4.72%db, 2.43%db, 1.70%db, and 0.53%db) are shown in Fig. 4. In the region of 925–1200nm in Fig. 4, the absorbance value of spectra differ clearly, while the higest spectrum line corresponds to the high rubber content in the sample. The functional groups of the rubber polymer (CH, CH3) in the sample strongly vibrated in this region,6 especially at 925, 1140, 1200, 1351, and 1659nm as shown in Fig. 5, where peaks were visible in the spectra preprocessed using the first derivative. The CH functional groups vibrated in the second and third overtone regions, around 1200 and 925nm,5 respectively, while around 1140nm and 1351nm, the CH3 functional groups vibrated in the second overtone region, and at around 1659nm in the first overtone region.29 In region 1400–1600nm in Fig. 4, the lowest line (0.53%db) has the lowest absorbance value corresponding to the lowest moisture content, whereas the higher lines have higher absorbance associated with the higher moisture content of the samples. The spectra were clearly different in this region due to the water (OH functional groups) in the second overtone region, especially around 1410nm29 (Fig. 5) where a peak was visible. When the spectra were preprocessed using the second derivative as shown in Fig. 6, peaks were visible near the peaks found in the first derivative spectrum. These peaks (933, 1125, 1185, 1211, 1333, 1365, 1444, and 1649nm) were related to the CH, CH3, CH, CH, CH3, CH3, OH, CH3 functional groups,30 respectively. The difference in the spectral characteristics indicated that NIR spectroscopy has the potential to discriminate among rubber sheets with different moisture contents.

Fig. 4. Mean spectra of the rubber sheets with different moisture contents.

Fig. 5. Mean spectra of rubber sheets with different moisture contents preprocessed by the first derivative.

Fig. 6. Mean spectra of rubber sheets with different moisture contents preprocessed by the second derivative.
3.3. Calibration model analysis
PLSR was used to build regression models of the moisture content in rubber sheets using the mean absorbance spectra in the calibration set. The optimal number of LV was determined for each spectral preprocessing option used (none, SNV, the first derivative, and the second derivative) based on cross-validation of each model. Table 2 shows the optimal LV as well as the performance of the PLSR models, with the best PLSR model for predicting the moisture content in rubber sheets being the model preprocessed using the first derivative based on prediction with R2P, RMSEP and RPDP values of 0.977%db, 0.227%db and 6.576%db, respectively. In general, for regression models, values of R2 in the range 0.82–0.90 usually indicate good performance, while higher than 0.90 indicates excellent performance.14 Similarly for RPD, a value greater than 2.0 indicates a good quantitative model, while greater than 3.0 indicates that the model is excellent.31 It should be noted that the optimal LV of PLSR models were rather high due to producing their lowest value of MSE in the cross-validation process. In other ways, the number of latent variable down to 12, 13, 11, and 12 for none, SNV, first derivative, and second derivative pretreatment, respectively could be used to obtain a good result with a few deterioration of performances because of having very small differences in the value of MSE compared with the optimal latent variable. The established PLSR model showed excellent performance with high values of R2P and RPDP for predicting the moisture content in rubber sheets, ensuring that the model could be used in the field.
Calibration set | Prediction set | |||||||
---|---|---|---|---|---|---|---|---|
Calibration model | Spectra pretreatment | LV | R2C | RMSEC | RPDC | R2P | RMSEP | RPDP |
PLSR | None | 20 | 0.986 | 0.176 | 8.373 | 0.977 | 0.229 | 6.555 |
SNV | 20 | 0.986 | 0.175 | 8.405 | 0.976 | 0.230 | 6.508 | |
1st derivative | 16 | 0.986 | 0.177 | 8.312 | 0.977 | 0.227 | 6.576 | |
2nd derivative | 14 | 0.981 | 0.206 | 7.131 | 0.969 | 0.263 | 5.559 | |
LS-SVM | None | — | 0.992 | 0.134 | 10.945 | 0.983 | 0.192 | 7.597 |
SNV | — | 0.991 | 0.141 | 10.453 | 0.983 | 0.193 | 7.588 | |
1st derivative | — | 0.991 | 0.142 | 10.321 | 0.983 | 0.193 | 7.451 | |
2nd derivative | — | 0.993 | 0.120 | 12.214 | 0.980 | 0.212 | 6.709 | |
ANN | None | — | 0.996 | 0.132 | 11.008 | 0.992 | 0.221 | 6.575 |
SNV | — | 0.995 | 0.165 | 8.647 | 0.988 | 0.223 | 6.398 | |
1st derivative | — | 0.997 | 0.126 | 11.757 | 0.993 | 0.179 | 8.276 | |
2nd derivative | — | 0.997 | 0.114 | 12.862 | 0.989 | 0.220 | 6.665 |
The regression coefficient of the best PLSR model (using first derivative spectral preprocessing) in Fig. 7 shows the impact of spectral wavebands on the moisture content of a rubber sheet. The wavebands at 1125 and 1389nm had a high positive correlation, while at 1156nm and 1430nm, there was a high negative correlation with the model prediction of moisture content. The wavebands around 1125, 1156, and 1389nm were related to the CH3 functional groups29 of the rubber polymer (poly-isoprene) that is the primary chemical constituent of natural rubber and the waveband at 1430nm was related to the OH functional groups of water32 in the rubber sheet. The above significant wavebands of the model were closely related to the moisture content and dry rubber content of the rubber sheet. Based on these observations, NIR spectroscopy using the PLSR model based on the spectral range 900–1700nm was feasible to rapidly and nondestructively determine the moisture content in a rubber sheet.

Fig. 7. Regression coefficients of NIR-based PLSR model.
Regression analysis using the LS-SVM algorithm was also used to build a calibration model based on the mean spectra of samples in the calibration set for predicting the moisture content in a rubber sheet. In this procedure, the parameters (regularization (γ) and the squared bandwidth of the Gaussian curve (σ2)) of the model were tuned to discover the optimal value using coupled simulated annealing (CSA) for the initial estimation and the simplex method for fine-tuning. The model performance of LS-SVM with the optimal parameters for each spectral preprocessing option used are shown in Table 2 and indicate that LS-SVM was slightly superior to PLSR regardless of the preprocessing techniques used in both the calibration and prediction sets. The best LS-SVM model for predicting the moisture content in a rubber sheet was the model with no spectral preprocessing that produced a moisture content prediction for the prediction set with R2P, RMSEP and RPDP values of 0.983%db, 0.192%db and 7.597%db, respectively. The LS-SVM model performed better than the PLSR model in many spectroscopic areas. Nevertheless, recent studies have reported that artificial intelligence such as the neural network algorithm has been successful in regression tasks as well and can improve the model performance in many applications. Therefore, in this research, a neural network was used to build a regression model to compare its predictive performance with the two previous algorithms.
In order to compare the results of the prediction models developed using PLSR and LS-SVM with the ANN model, the same dataset was imported into the MATLAB R2017a software environment. In this paper, the ANN was designed with only one hidden layer due to the high computation requirements of the Jacobian matrix of the error function and the need for invert matrices with the same size as the number of network weights.26 To avoid over-fitting for a small number of training samples (200 samples in the calibration set), the number of hidden neurons varied from 1 to 16 units to find an optimal solution. A model of two units in the hidden layer was selected due to its minimum error (MSE) compared with the others. This meant that if the number of hidden nodes were too large for the number of training samples, then the network would converge more easily and fit well with the training data, but it would not be suitable for generalizing well for other data.22 Table 2 shows that the performance of the ANN model with the first derivative had the best result with values for R2P, RMSEP, and RPDP of 0.993%db, 0.179%db and 8.276%db, respectively. Figures 8(e) and 8(f) presents the scatter plots of the best results for the ANN model for both the calibration and prediction sets. In addition, it can be observed that the pretreated spectra of the first derivative could clearly separate the wavelengths at about 925, 1200 and 1410nm. These wavelengths corresponded to the functional groups of rubber polymer (CH, CH3) and the second overtone of water (OH).4 In this case, importantly, RMSEP value reduced from around 0.22–0.179 when compared with the other spectral processing methods. These results indicated that the ANN model also had the best RMSEP value compared with the PLSR and LS-SVM models. Therefore, the LM algorithm had excellent convergence properties with less hidden nodes and this allowed for a more precise estimation of the prediction error.26

Fig. 8. Scatter plot of predicted and reference moisture content for PLSR, LS-SVM, and ANN models using calibration and prediction sets.
4. Conclusion
The results indicated that NIRS had high potential as a tool for predicting the moisture content in a rubber sheet because it could identify the powerful absorption bands of the functional groups of the rubber polymer and water in the near-infrared region. The current results show an important correlation between moisture content and NIR generated spectral data. As part of the spectral preprocessing, the results showed that the first derivative method resulted in the best outcomes for the PLSR and ANN models, because of its ability to arrange the wavelengths corresponding with the functional groups of rubber polymer and water. This paper developed and compared different prediction models (PLSR, LS-SVM, and ANN). The results showed that using the ANN model provided the best outcome in predicting the moisture content compared with the other models in terms of the value of RMSEP. As a nonlinear model, the ANN model was able to utilize the predictive ability associated with the nonlinearity that existed in the data. In terms of the complexity of the network, it should be noted that the network topology should have enough complexity to avoid over-fitting as stated in the previous section. Based on this case, it can be confirmed that classical regression methods such as PLSR do not always provide the optimum option when dealing with spectroscopy.
Acknowledgment
This research was supported by the Faculty of Engineering at Kamphaeng Saen, Kasetsart University, Thailand.