World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

An Ensemble Cost-Sensitive One-Class Learning Framework for Malware Detection

    https://doi.org/10.1142/S0218001415500184Cited by:3 (Source: Crossref)

    Machine learning is among the most popular methods in designing unknown and variant malware detection algorithms. However, most of the existing methods take a single type of features to build binary classifiers. In practice, these methods have limited ability in depicting malware characteristics and the binary classification suffers from inadequate sampling of benign samples and extremely imbalanced training samples when detecting malware. In this paper, we present a malware detection Framework based on ENsemble One-Class Learning, namely FENOC. It uses hybrid features at different semantic layers to ensure a comprehensive insight of the program to be analyzed. We construct the malware detector by a novel learning algorithm called Cost-sensitive Twin One-class Classifier (CosTOC), which uses a pair of one-class classifiers to describe malware and benign programs respectively. CosTOC is more flexible and robust in comparison to conventional binary classifiers when training samples are extremely imbalanced or the benign programs are inadequately sampled. Finally, random subspace method and clustering-based ensemble method are developed to enhance the generalization ability of CosTOC. Experimental results show that FENOC gives a comparative detection rate and a lower false positive rate than many other binary classification algorithms, especially when the detector are trained with imbalanced data, or evaluated in terms of false positive rate.