Please login to be able to save your searches and receive alerts for new content matching your search criteria.
In this paper, we investigate the impact of Random Undersampling (RUS) on a supervised Machine Learning task involving highly imbalanced Big Data. We present the results of experiments in Medicare Fraud detection. To the best of our knowledge, these experiments are conducted with the largest insurance claims datasets ever used for Medicare Fraud detection. We obtain two datasets from two Big Data repositories provided by the United States government’s Centers for Medicare and Medicaid Services. The larger of the two datasets contains nearly 174 million instances, with a minority to majority class ratio of approximately 0.0039. Our contribution is to show that RUS has a detrimental effect on a Medicare Fraud detection task when performed on large scale, imbalanced data. The effect of RUS is apparent in the Area Under the Precision Recall Curve (AUPRC) scores recorded from experimental outcomes. We use four popular, open-source classifiers in our experiments to confirm the negative impact of RUS on their AUPRC scores.
Android is the most well-known portable working framework having billions of dynamic clients worldwide that pulled in promoters, programmers, and cybercriminals to create malware for different purposes. As of late, wide-running inquiries have been led on malware examination and identification for Android gadgets while Android has likewise actualized different security controls to manage the malware issues, including a User ID (UID) for every application, framework authorizations. In this paper, we advance and assess various kinds of machine learning (ML) by applying ensemble-based learning systems for identifying Android malware related to a substring-based feature selection (SBFS) strategy for the classifiers. In the investigation, we have broadened our previous work where it has been seen that the ensemble-based learning techniques acquire preferred outcome over the recently revealed outcome by directing the DREBIN dataset, and in this manner they give a solid premise to building compelling instruments for Android malware detection.