k-Means Clustering with Optimal Centroid: An Optimization Insisted Model for Removing Outliers
Abstract
In data cleaning, the process of detecting and correcting corrupt, inaccurate or irrelevant records from the record set is a tedious task. Particularly, the process of “outlier detection” occupies a significant role in data cleaning that removes or eliminates the outlier’s that exist in data. Traditionally, more efforts have been taken to remove the outliers, and one of the promising ways is customizing clustering models. In this manner, this paper intends to propose a new outlier detection model via enhanced k-means with outlier removal (E-KMOR), which assigns all outliers into a group naturally during the clustering process. For assigning the point to be outliers, a new intra-cluster based distance evaluation is employed. The main contribution of this paper is to select cluster centroid optimally through a newly proposed hybrid optimization algorithm termed particle updated lion algorithm (PU-LA), which hybrids the concepts of LA and particle swarm optimization (PSO), respectively. Thereby, the proposed work is named as E-KMOR-PU-LA. Finally, the efficacy of the proposed E-KMOR-PU-LA model is proved through a comparative analysis over conventional models by concerning runtime and accuracy.