Please login to be able to save your searches and receive alerts for new content matching your search criteria.
This work presents a novel system based on artificial immune system for spam detection. A relatively new machine learning method inspired by the human immune system called Artificial Immune System (AIS) has been emerging recently. This method is currently undergoing intense investigation and demonstration. Core modifications were applied on the standard AIS with the aid of the Genetic Algorithm (GA). SpamAssassin corpus is used in all our simulations. Spam is a serious universal problem which causes problems for almost all computer users. This issue affects not only normal users of the internet, but also causes problems for companies and organizations due to expensive costs in lost productivity, wasting users' time and network bandwidth. Many studies on spam indicate that it costs organizations billions of dollars annually. We introduce a GA assisted AIS in spam detection, and compare between two methods. Encouraging results were achieved when comparing to commercially available anti-spam software.
Nowadays, text is one prevalent forms of data and text classification is a widely used data mining task, which has various application fields. One mass-produced instance of text is email. As a communication medium, despite having a lot of advantages, email suffers from a serious problem. The number of spam emails has steadily increased in the recent years, leading to considerable irritation. Therefore, spam detection has emerged as a separate field of text classification. A primary challenge of text classification, which is more severe in spam detection and impedes the process, is high-dimensionality of feature space. Various dimension reduction methods have been proposed that produce a lower dimensional space compared to the original. These methods are divided mainly into two groups: feature selection and feature extraction. This research deals with dimension reduction in the text classification task and especially performs experiments in the spam detection field. We employ Information Gain (IG) and Chi-square Statistic (CHI) as well-known feature selection methods. Also, we propose a new feature extraction method called Sprinkled Semantic Feature Space (SSFS). Furthermore, this paper presents a new hybrid method called IG_SSFS. In IG_SSFS, we combine the selection and extraction processes to reap the benefits from both. To evaluate the mentioned methods in the spam detection field, experiments are conducted on some well-known email datasets. According to the results, SSFS demonstrated superior effectiveness over the basic selection methods in terms of improving classifiers’ performance, and IG_SSFS further enhanced the performance despite consuming less processing time.
The electronic mail (email) is nowadays an essential communication service being widely used by most Internet users. One of the main problems affecting this service is the proliferation of unsolicited messages (usually denoted by spam) which, despite the efforts made by the research community, still remains as an inherent problem affecting this Internet service. In this perspective, this work proposes and explores the concept of a novel symbiotic feature selection approach allowing the exchange of relevant features among distinct collaborating users, in order to improve the behavior of anti-spam filters. For such purpose, several Evolutionary Algorithms (EA) are explored as optimization engines able to enhance feature selection strategies within the anti-spam area. The proposed mechanisms are tested using a realistic incremental retraining evaluation procedure and resorting to a novel corpus based on the well-known Enron datasets mixed with recent spam data. The obtained results show that the proposed symbiotic approach is competitive also having the advantage of preserving end-users privacy.
Email spam is a security problem that involves different techniques in machine learning to solve this problem. The rise of this security issue makes organisation email service unreliable and has a direct relation with vulnerability of clients through unexpected spam mails, like ransomware. There are several methods to identifying spam emails. Most of these methods focused on feature selection; however, these models decreased the accuracy of the detection. This paper proposed a novel spam detection method that is not only to decrease the accuracy, but eliminates unsuitable features with less processing. The features are in the terms of contents, and the number of features is very big, so it can decrease the memory complexity. We use Hewlett-Packet (HP) laboratory samples text emails. First, GA algorithm is employed to select features without limited number of feature selection with the aid of Bayesian theory as a fitness function and checked with a different number of repetitions. The result improved with GA by increasing number of repetitions, and tested with distinctive selection method, Random selection and Tournament selection. In the second stage, the dataset classifies emails as Spam or Ham by Naive Bayes. The results show that Naive Bayes and hybrid GA-Naive Bayes are almost identical, but GA-Naive Bayes has a better performance.