This paper presents an obstacle detection and tracking framework for mobile robots and autonomous vehicles equipped with LiDAR (Light Detecting and Range finding) sensors. The framework contains a detection module (DM) for clustering the point cloud and modeling the obstacles, and a tracking module (TM) for recognizing the obstacles and estimating their velocities. In order to detect the obstacles, DM segments the point cloud by finding the gaps in it. Detected obstacles are modeled by one or two line segments depending on their geometry. TM gets the line segments returned by DM as the obstacles to track. To this end, first, the obstacles are labeled and their features are stored. Thereafter, a set of equations are solved to recognize the labeled obstacles. Finally, the Kalman filter is used to calculate the translational and rotational velocities of the obstacles. The framework is evaluated in experiments on a robot platform and using the KITTI dataset. The results indicate satisfactory performance in terms of effectiveness and quickness and confirm that DM and TM are qualified enough to perform in real time as ancillary modules of mobile robots and autonomous vehicles. Especially, because of robust and accurate obstacle modeling, the velocity diagrams are smooth and coherent, a point which is not seen in similar researches.
This paper focuses on automatic fuzzy clustering problem and proposes a novel automatic fuzzy clustering method that employs an extended membrane system with active membranes that has been designed as its computing framework. The extended membrane system has a dynamic membrane structure; since membranes can evolve, it is particularly suitable for processing the automatic fuzzy clustering problem. A modification of a differential evolution (DE) mechanism was developed as evolution rules for objects according to membrane structure and object communication mechanisms. Under the control of both the object’s evolution-communication mechanism and the membrane evolution mechanism, the extended membrane system can effectively determine the most appropriate number of clusters as well as the corresponding optimal cluster centers. The proposed method was evaluated over 13 benchmark problems and was compared with four state-of-the-art automatic clustering methods, two recently developed clustering methods and six classification techniques. The comparison results demonstrate the superiority of the proposed method in terms of effectiveness and robustness.
In this study, a novel quantum-based multi-objective is proposed using Schrödinger equations. The two new operations namely weighted cluster centroid computation and threshold setting are also introduced to refine cluster centroids. A novel fitness function strategy is also proposed for efficient searching. The proposed technique is compared with various well-known approaches. Experimental outcomes show that the proposed quantum approach outperforms other existing approaches.
Data clustering has been widely used in many areas, such as data mining, statistics, machine learning and so on. A variety of clustering approaches have been proposed so far, but most of them are not qualified to quickly cluster a large-scale high-dimensional database. This paper is devoted to a novel data clustering approach based on a generalized particle model (GPM). The GPM transforms the data clustering process into a stochastic process over the configuration space on a GPM array. The proposed approach is characterized by the self-organizing clustering and many advantages in terms of the insensitivity to noise, quality robustness to clustered data, suitability for high-dimensional and massive data sets, learning ability, openness and easier hardware implementation with the VLSI systolic technology. The analysis and simulations have shown the effectiveness and good performance of the proposed GPM approach to data clustering.
Existing clustering algorithms, such as single-link clustering, k-means, CURE, and CSM are designed to find clusters based on predefined parameters specified by users. These algorithms may be unsuccessful if the choice of parameters is inappropriate with respect to the data set being clustered. Most of these algorithms work very well for compact and hyper-spherical clusters. In this paper, a new hybrid clustering algorithm called Self-Partition and Self-Merging (SPSM) is proposed. The SPSM algorithm partitions the input data set into several subclusters in the first phase and, then, removes the noisy data in the second phase. In the third phase, the normal subclusters are continuously merged to form the larger clusters based on the inter-cluster distance and intra-cluster distance criteria. From the experimental results, the SPSM algorithm is very efficient to handle the noisy data set, and to cluster the data sets of arbitrary shapes of different density. Several examples for color image show the versatility of the proposed method and compare with results described in the literature for the same images. The computational complexity of the SPSM algorithm is O(N2), where N is the number of data points.
Genetic Algorithm (GA) has been successfully applied to codebook design for vector quantization and its candidate solutions are normally turned by LBG algorithm. In this paper, to solve premature phenomenon and falling into local optimum of GA, a new Genetic Simulated Annealing-based Kernel Vector Quantization (GSAKVQ) is proposed from a different point of view. The simulated annealing (SA) method proposed in this paper can approach the optimal solution faster than the other candidate approaches. In the frame of GA, firstly, a new special crossover operator and a mutation operator are designed for the partition-based code scheme, and then a SA operation is introduced to enlarge the exploration of the proposed algorithm, finally, the Kernel function-based fitness is introduced into GA in order to cluster those datasets with complex distribution. The proposed method has been extensively compared with other algorithms on 17 datasets clustering and four image compression problems. The experimental results show that the algorithm can achieve its superiority in terms of clustering correct rate and peak signal-to-noise ratio (PSNR), and the robustness of algorithm is also very good. In addition, we took “Lena” as an example and added Gaussian noise into the original image then adopted the proposed algorithm to compress the image with noise. Compared to the original image with noise, the reconstructed image is more distinct, and with the parameter value increasing, the value of PSNR decreases.
Data clustering is the task of separating data samples into a set of clusters. K-means is a popular partitional clustering algorithm. However, it has a lot of weaknesses, including sensitivity to initialization and the ability to become stuck in local optima. Hence, nature-inspired optimization algorithms were applied to the clustering problem to overcome the limitations of the K-means algorithm. However, due to the high-dimensionality of a search space, the nature-inspired optimization algorithm suffers from local optima and poor convergence rates. To address the mentioned issues, this paper presents a simplex method-based bacterial colony optimization (SMBCO) algorithm. The simplex method is a stochastic variant approach that improves population diversity while increasing the algorithm’s local searching ability. The potential and effectiveness of the proposed SMBCO clustering algorithm are assessed using a variety of benchmark machine learning datasets and the generated groups were evaluated using different performance measures. When compared to several well-known nature-inspired algorithms, the experimental results reveal that the SMBCO model produces superior clustering efficiency and a faster convergence rate.
In most systems, a smart functionality is enabled through an essential vital service such as detecting anomalies from complex, large-scale and dynamic data. However, ensuring the privacy and security for the cloud data is the most crucial and challenging task in the present world. Moreover, it is important to safeguard the security of sensitive data and its privacy from unauthorized parties who are trying to access the data. Therefore, to accomplish this task, several encryption, decryption and key generation mechanisms were introduced in the existing works for privacy preserving in cloud platform. But, there still remain open issues such as increased communication overhead, reduced security and increased time consumption. Also, these existing works followed the symmetric key cryptographic mechanism for privacy preservation of data; hence, a single secret key is shared by several users for accessing the original data. Due to this fact, a high security risk arises and it allows unauthorized parties to access the data. Thus, this work introduces a cloud-based privacy preserving model for offering a scalable and reliable anomaly detection service for sensor data through holding the benefits of cloud resources. Also, this paper aims to impose a newly developed Elliptic Curve Cryptography-based Collective Decision Optimization (ECDO) approach over the proposed framework for improving the privacy and security of the data. Furthermore, to perform the data clustering computation we used the Gaussian kernel fuzzy c-means clustering (GKFCM) algorithm within the cloud platform, especially for data partitioning and to classify the anomalies. Thus, the computational difficulties are limited by adopting this suitable privacy preserving model which collaborates a private server and a set of public servers through a cloud data center. Moreover, on encrypted data the granular anomaly detection operations are performed by the virtual nodes executed over public servers. Experimental validation was performed on four datasets resulting from Intel Labs publicly available sensor data. The experimental outcomes demonstrate the ability of the proposed framework in providing high anomaly detection accuracy without any degradation in data privacy.
Since web pages visited by users contain a variety of data resources and the clustering algorithms frequently used for web data do not take the heterogeneous nature into account when processing the heterogeneous data, this paper proposes a new algorithm, namely IHPSOC algorithm, to cluster web log data on the basis of web log mining. Based on particle swarm optimization (PSO), IHPSOC algorithm clusters the web log data through particle swarm iteration. Based on clustering results, this paper establishes Markov chain-like models which create a corresponding Markov chain for users in each different category so as to predict the web resources in users’ need. The results of the experiments show that the proposed model gives better predication.
Data clustering is a common data mining techniques used in many applications such as data analysis and pattern recognition. K-means algorithm is the common clustering method which has fallen into the trap of local optimization and does not always create the optimized response to the problem, although having more advantages such as high speed. Artificial bee colony (ABC) is a novel biological-inspired optimization algorithm, having the advantage of less control parameters, strong global optimization ability and easy to implement. However, there are still some problems in ABC algorithm, like inability to find the best solution from all possible solutions. Due to the large step of searching equation in ABC, the chance of skipping the true solution is high. Therefore, in this paper, to balance the diversity and convergence ability of the ABC, Mantegna Lévy distribution random walk is proposed and incorporated with ABC. The new algorithm, ABCL, brings the power of the Artificial Bee Colony algorithm to the K-means algorithm. The proposed algorithm benefits from Mantegna Lévy distribution to promote the ABC algorithm in solving the number of functional evaluation and also obtaining better convergence speed and high accuracy in a short time. We empirically evaluate the performance of our proposed method on nine standard datasets taken from the UCI Machine Learning Repository. The experimental results show that the proposed algorithm has ability to obtain better results in terms of convergence speed, accuracy, and reducing the number of functional evaluation.
Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.
The widespread application of clustering in various fields leads to the discovery of different clustering techniques in order to partition multidimensional data into separable clusters. Although there are various clustering approaches used in literature, optimized clustering techniques with multi-objective consideration are rare. This paper proposes a novel data clustering algorithm, Enhanced Kernel-based Exponential Grey Wolf Optimization (EKEGWO), handling two objectives. EKEGWO, which is the extension of KEGWO, adopts weight exponential functions to improve the searching process of clustering. Moreover, the fitness function of the algorithm includes intra-cluster distance and the inter-cluster distance as an objective to provide an optimum selection of cluster centroids. The performance of the proposed technique is evaluated by comparing with the existing approaches PSC, mPSC, GWO, and EGWO for two datasets: banknote authentication and iris. Four metrics, Mean Square Error (MSE), F-measure, rand and jaccord coefficient, estimates the clustering efficiency of the algorithm. The proposed EKEGWO algorithm can attain an MSE of 837, F-measure of 0.9657, rand coefficient of 0.8472, jaccord coefficient of 0.7812, for the banknote dataset.
The implication of firefly and fuzzy firefly optimization algorithms has been greatly witnessed in clustering techniques and extensively used in applications such as Image segmentation. Parameters such as step factor and attractiveness have been kept constant in these algorithms, which affect the convergence rate and accuracy of the clustering process. Though fuzzy adaptive firefly algorithm tackled this problem by making those parameters an adaptive one, issues such as low convergence rate, and provision of non-optimal solutions are still there. To tackle these issues, this paper proposed a novel fuzzy adaptive fuzzy firefly algorithm that significantly improves the accuracy and convergence rate while comparing with the existing optimization algorithms. Further, the proposed algorithm fused with existing hybrid clustering algorithms involving fuzzy set, intuitionistic fuzzy set, and rough set resulted in eight novel hybrid clustering algorithms which lead to better performance in optimizing the selection of initial centroids. To validate the proposal, experimental studies have been conducted on datasets found in bench-marking data repositories such as UCI, and Kaggle. The performance and accuracy evaluation of proposed algorithms have been carried out with the aid of seven accuracy measures. Results clearly indicate the improved accuracy and convergence rate of the proposed algorithms.
Point-of-Sale (POS) data analysis is usually used to explore sales performance in business commence. This manuscript aims to combine unsupervised clustering and supervised classification methods in an integrated data analysis framework to analyze the real-world POS data. Clustering method, which is performed on sales dataset, is used to cluster the stores into several groups. The clustering results, data labels, are then combined with other information in store features dataset as the inputs of the classification model which classifies the clustering labels by using store features dataset. Non-dominated sorting generic algorithm-II (NSGA-II) is applied in the framework to employ the multi-objective of clustering and classification. The experimental case study shows clustering results can reveal the hidden structure of sales performance of retail stores while classification can reveal the major factors that effect to the sales performance under different group of retail stores. The correlations between sales clusters and the store information can be obtained sequentially under a series of data analysis with the proposed framework.
This paper presents an approach for analyzing relationships between data size, cluster, accuracy and diversity in neural network ensembles. The main objective of this research is to find out the influence of data size such as number of patterns, number of inputs and number of classes on various parameters such as clusters, accuracy and diversity of a neural network ensemble. The proposed approach is based on splitting data sets into different groups using the data size, clustering data and conducting training and testing of neural network ensembles. The test data is same for all groups and used to test all trained ensembles. The experiments have been conducted on 15 UCI machine learning benchmark datasets and results are presented in this paper.
The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Many of the methods developed employ local search or heuristic strategies for identifying the "best" arrangement of features according to some metric. In this article, we present rigorous clustering methods based on the optimal re-ordering of data matrices. Distinct mixed-integer linear programming (MILP) models are utilized for the clustering of (a) dense data matrices, such as gene expression data, and (b) sparse data matrices, which are commonly encountered in the field of drug discovery. Both methods can be used in an iterative framework to bicluster data and assist in the synthesis of drug compounds, respectively. We demonstrate the capability of the proposed optimal re-ordering methods on several data sets from both systems biology and molecular discovery studies and compare our results to other clustering techniques when applicable.
Fault detection and diagnostics (FDD) can be used to monitor the performance of air conditioners (ACs) and heat pumps (HPs), signal any departure from their optimal performance, and provide diagnostic information indicating a possible fault if degradation of performance occurs. For packaged systems fully assembled in a factory, an FDD module can be fully developed for all units of a given model based on laboratory tests of a single unit. For field-assembled systems, laboratory tests of a representative AC or HP installation can lead to the development of a “back-bone” preliminary FDD algorithm; however, in situ adaptation of these algorithms is required because of installation variations in the field. This paper describes a method for adapting a laboratory-based FDD module to field-assembled systems by automatically customizing the in situ FDD fault-free performance correlations. We validated the developed data-clustering technique with a set of nearly 6000 data points to generate fault-free correlations for an HP operating in the cooling mode in our laboratory. The study evaluated several fault-free feature models and indicated that the use of different order correlations during stages of data collection produced better fits to the data.
Big Data is a popular cutting-edge technology nowadays. Techniques and algorithms are expanding in different areas including engineering, biomedical, and business. Due to the high-volume and complexity of Big Data, it is necessary to conduct data pre-processing methods when data mining. The pre-processing methods include data cleaning, data integration, data reduction, and data transformation. Data clustering is the most important step of data reduction. With data clustering, mining on the reduced data set should be more efficient yet produce quality analytical results. This paper presents the different data clustering methods and related algorithms for data mining with Big Data. Data clustering can increase the efficiency and accuracy of data mining.
Gait analysis is popular in many clinical and biomechanical applications, such as diagnosis of diabetic neuropathy, rehabilitation evaluation of stroke patients and performance measurement of sports training. With the rapid and in-depth development of flexible sensing technology, a large-area pressure-sensitive floor can be easily installed in many locations. Complex movement besides linear walking can be designed on large-area floors for gait analysis in clinical or sports research. To conduct those researches, as a basic step, a computational approach is necessary to track each footprint correctly during the movement process. A multi-stage methodology is proposed to solve two main subtasks in the tracking process: (1) the labeling of different footprints and (2) the detection of basic foot gestures in the movement process. The methodology consists of an initial clusters creating stage, a cluster labeling stage and an overlapped footprints separating process. Tai Chi Chuan, one of complex foot movements, was used as an example to evaluate the proposed approach. An overall accuracy of 99.07% for footprint labeling and 90.39% for basic foot gesture detecting were achieved by the method.
This work intends to predict traffic congestions from high-speed IoT data streams using Complex Event Processing (CEP) engines. As CEP engines are reactive in nature and have static thresholds, we propose an unsupervised Genetic Algorithm based clustering procedure which classifies the traffic into congestion and no-congestion classes. It also enables the CEP rule engine to form complex events with adaptive thresholds which change with context. Extensive analysis of traffic features is implemented so as to identify the relationship between temporal, environmental and social features and their impact on the CEP rule formation. A high recall of 96.8% indicates better performance, with lesser false positives, over baseline, and multiple hypothesis test results further, strengthen the effectiveness of the proposed approach.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.