Descriptor selection based on variable stability for predicting inhibitor activity
Abstract
Quantitative structure-activity relationship (QSAR) has been a technique to study the relationship between chemical structures and properties, and variable selection is an important problem for finding the informative variables and building reliable models. A variable selection method based on variable stability is proposed and used for selecting the informative descriptors in the QSAR model of inhibitors. In the method, a series of models are built by leave-one-out cross validation (LOOCV), and variable stability is defined as the ratio of the absolute mean value and standard deviation of the regression coefficients in the models for a descriptor. Therefore, the descriptors with larger stabilities are more informative to the model. To further enhance the difference among the descriptors, an exponential parameter is used to modify the standard deviation. The results show that 13 descriptors are selected as informative ones from 1217 descriptors for the QSAR model of inhibitors. An effective prediction model can be constructed by them.
