World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Spring Sale: Get 35% off with a min. purchase of 2 titles. Use code SPRING35. Valid till 31st Mar 2025.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

LOGIC FORMULAS BASED KNOWLEDGE DISCOVERY AND ITS APPLICATION TO THE CLASSIFICATION OF BIOLOGICAL DATA

    https://doi.org/10.1142/9789814271820_0017Cited by:3 (Source: Crossref)
    Abstract:

    Classifiers built through supervised learning techniques are widely used in computational biology. Examples are neural networks, decision trees and support vector machines. Recently, an extension of Regularized Generalized Eigenvalues Classifier (ReGEC) has been proposed, in which prior knowledge is included. When knowledge is formalized as a set of linear constraints to the ReGEC, the resulting non linear classifier has a lower complexity and halves the misclassi-fication error with respect to the original method. In this work, we show how logic programming can extract knowledge from data to enhance classification models produced by ReGEC. The knowledge extraction method is based on two phases: a feature selection phase and a rules extraction phase. Feature selection is formulated as an integer programming problem that extends a set covering problem. The extraction phase is performed through the iterative solution of different instances of the same minimum cost satisfiability problem that models the logic separation rules used for classification. The overall method, that we call LF-ReGEC, guarantees that the number of points in the training set is not increased and the resulting model does not overfit the problem. Furthermore, the overall accuracy of the method is increased. Finally, the method is compared with other methods using genomic and proteomic data sets taken from the literature.