The development of digital pathology offers a significant opportunity to evaluate and analyze the whole slides of disease tissue effectively. In particular, the segmentation of nuclei from histopathology images plays an important role in quantitatively measuring and evaluating the acquired diseased tissue. There are many automatic methods to segment cell nuclei in histopathology images. One widely used unsupervised segmentation approach is based on standard k-means or fuzzy c-means (FCM) to process the color histopathology images to segment cell nuclei. Compared with the supervised learning method, this approach can obtain segmented nuclei without annotated nuclei labels for training, which saves a lot of labeling and training time. The color space and k value among this method plays a crucial role in determining the nuclei segmentation performance. However, few works have investigated various color spaces and k value selection simultaneously in unsupervised color-based nuclei segmentation with k-means or FCM algorithms. In this study, we will present color-based nuclei segmentation methods with standard k-means and FCM algorithms for histopathology images. Several color spaces of Haematoxylin and Eosin (H&E) stained histopathology data and various k values among k-means and FCM are investigated correspondingly to explore the suitable selection for nuclei segmentation. A comprehensive nuclei dataset with 7 different organs is used to validate our proposed method. Related experimental results indicate that L∗a∗b∗ and the YCbCr color spaces with a k of 4 are more reasonable for nuclei segmentation via k-means, while the L∗a∗b∗ color space with k of 4 is useful via FCM.
Classification of imbalanced datasets has attracted substantial research interest over the past decades. Imbalanced datasets are common in several domains such as health, finance, security and others. A wide range of solutions to handle imbalanced datasets focus mainly on the class distribution problem and aim at providing more balanced datasets by means of resampling. However, existing literature shows that class overlap has a higher negative impact on the learning process than class distribution. In this paper, we propose overlap-based undersampling methods for maximizing the visibility of the minority class instances in the overlapping region. This is achieved by the use of soft clustering and the elimination threshold that is adaptable to the overlap degree to identify and eliminate negative instances in the overlapping region. For more accurate clustering and detection of overlapped negative instances, the presence of the minority class at the borderline areas is emphasized by means of oversampling. Extensive experiments using simulated and real-world datasets covering a wide range of imbalance and overlap scenarios including extreme cases were carried out. Results show significant improvement in sensitivity and competitive performance with well-established and state-of-the-art methods.
In the paper Improved Overlap-Based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson’s Disease, the authors introduced two new methods that address the class overlap problem in imbalanced datasets. The methods involve identification and removal of potentially overlapped majority class instances. Extensive evaluations were carried out using 136 datasets and compared against several state-of-the-art methods. Results showed competitive performance with those methods, and statistical tests proved significant improvement in classification results. The discussion on the paper related to the behavioral analysis of class overlap and method validation was raised by Fernández. In this article, the response to the discussion is delivered. Detailed clarification and supporting evidence to answer all the points raised are provided.
This study addresses the problem of locating sugar cane loading stations and allocating cane fields to those stations. The problem is different from the general location–allocation problem in so far as this framework takes into account the different maturity periods of each cane field. If the loading station is improperly located, it can result in high transportation costs from cane fields to the station and significant fluctuations in the station utilization rate. A modification of the well-known "fuzzy c-means" (FCM) method, which takes into account both the cane supply and the different cane maturity periods, is proposed to solve this problem. The objective of the model is to minimize the sum of the transportation and station utilization costs. The performance of this method is compared to that of the traditional FCM method. The results show that the proposed approach is practical for solving the problem and that it provides a better solution than the FCM method.
In this paper, a new Hopfield-model net called Compensated Fuzzy Hopfield Neural Network (CFHNN) is proposed for vector quantization in image compression. In CFHNN, the compensated fuzzy c-means algorithm, modified from penalized fuzzy c-means, is embedded into Hopfield neural network so that the parallel implementation for codebook design is feasible. The vector quantization can be cast as an optimal problem that may also be regarded as a minimization of a criterion defined as a function of the average distortion between training vector and codevector. The CFHNN is trained to classify the divided vectors on a real image into feasible class to generate an available codebook when the defined energy function converges to near global minimum. The training vectors on a divided image are mapped to a two-dimensional Hopfield neural network. Also the compensated fuzzy c-means technique is used to update the quantization performance and to eliminate searching for the weighting factors. In the context of vector quantization, each training vector on the divided image is represented by a neuron which is fully connected by the other neurons. After a number of iterations, neuron states are refined to reach near optimal result when the defined energy function is converged.
A knowledge-guided approach to automatic classification of Coastal Zone Color images off the West Florida Shelf is described. The approach is used to identify red tides on the West Florida Shelf, as well as areas with high concentration of dissolved organic matter such as a river plume found seasonally along the West Florida coast over the middle of the shelf. The Coastal Zone Color images are initially segmented by the unsupervised Multistage Random Sampling Fuzzy c-Means algorithm. Then, a knowledge-guided system is applied to the centroid values of resultant clusters to label case I, case II waters, a dilute river plume ("green river"), and red tide. The domain knowledge base contains information on cluster distribution in feature space, as well as spatial information such as bathymetry data. Our knowledge base consists of a rule-guided system and an embedded neural network. From 60 images, after training the system, this procedure recognizes all 15 images which contained a river plume and 45 images without. The system can correctly classify 74% of the pixels that belong to the river plume, which provides a substantial advantage to users looking for offshore extensions of riverine influence. Red tides are also successfully identified in a time series of images for which ground truth confirmed the presence of a harmful bloom.
In some Computer Vision applications there is the need for grouping, in one or more clusters, only a part of the whole dataset. This happens, for example, when samples of interest for the application at hand are present together with several noisy samples.
In this paper we present a graph-based algorithm for cluster detection that is particularly suited for detecting clusters of any size and shape, without the need of specifying either the actual number of clusters or the other parameters.
The algorithm has been tested on data coming from two different computer vision applications. A comparison with other four state-of-the-art graph-based algorithms was also provided, demonstrating the effectiveness of the proposed approach.
Unsupervised image segmentation is a fundamental but challenging problem in computer vision. In this paper, we propose a novel unsupervised segmentation algorithm, which could find diverse applications in pattern recognition, particularly in computer vision. The algorithm, named Two-stage Fuzzy c-means Hybrid Approach (TFHA), adaptively clusters image pixels according to their multichannel Gabor responses taken at multiple scales and orientations. In the first stage, the fuzzy c-means (FCM) algorithm is applied for intelligent estimation of centroid number and initialization of cluster centroids, which endows the novel segmentation algorithm with adaptivity. To improve the efficiency of the algorithm, we utilize the Gray Level Co-occurrence Matrix (GLCM) feature extracted at the hyperpixel level instead of the pixel level to estimate centroid number and hyperpixel-cluster memberships, which are used as initialization parameters of the following main clustering stage to reduce the computational cost while keeping the segmentation performance in terms of accuracy close to original one. Then, in the second stage, the FCM algorithm is utilized again at the pixel level to improve the compactness of the clusters forming final homogeneous regions. To examine the performance of the proposed algorithm, extensive experiments were conducted and experimental results show that the proposed algorithm has a very effective segmentation results and computational behavior, decreases the execution time and increases the quality of segmentation results, compared with the state-of-the-art segmentation methods recently proposed in the literature.
In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.
A procedure for estimating the joint probability density function (pdf) of T1, T2 and proton spin density (PD) for gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) in the brain is presented. The pdf's have numerous applications, including the study of tissue parameter variability in pathology and across populations. The procedure requires a multispectral, spin echo magnetic resonance imaging (MRI) data set of the brain. It consists of five automated steps:
(i) preprocess the data to remove extracranial tissue using a sequence of image processing operators;
(ii) estimate T1, T2 and PD by fitting the preprocessed data to an imaging equation;
(iii) perform a fuzzy c-means clustering on the same preprocessed data to obtain a spatial map representing the membership value of the three tissue classes at each pixel location;
(iv) reject estimates which are not from pure tissue or have poor fits in the parameter estimation, and classify the remaining estimates as either GM, WM or CSF;
(v) compute statistics on the classified estimates to obtain a probability mass function and a Gaussian joint pdf of the tissue parameters for each tissue class.
Some preliminary results are shown comparing computed pdf's of young, elderly and Alzheimer's subjects. Two brief examples applying the joint pdf's to pulse sequence optimization and generation of computational phantoms are also provided.
DNA motif discovery is an important task since it helps to better understand the regulation of the transcription in the protein synthesis process. This paper introduces a novel method for the task of DNA motif finding where the proposed method adopts machine-learning approach by the use of a well-known clustering algorithm, Fuzzy C-Means. The method is explained in detail and tested against DNA sequences extracted from the genome of Saccharomyces cerevisiae and Escherichia coli organisms. Experimental results suggest that the algorithm is efficient in finding statistically interesting features existing in the DNA sequences. The comparison of the algorithm with the well-known motif finding tools, MEME and MDScan, which are built on statistical and word-enumerative models, shows the advantages of the proposed method over the existing tools and the promising direction of the machine-learning approach.
The energy efficiency in wireless sensor networks (WSNs) is a fundamental challenge. Cluster-based routing is an energy saving method in this type of networks. This paper presents an energy-efficient clustering algorithm based on fuzzy c-means algorithm and genetic fuzzy system (ECAFG). By using FCM algorithm, the clusters are formed, and then cluster heads (CHs) are selected utilizing GFS. The formed clusters will be remaining static but CHs are selected at the beginning of each round. FCM algorithm forms balanced clusters and distributes the consumed energy among them. Using static clusters also reduces the data overhead and consequently the energy consumption. In GFS, nodes energy, the distance from nodes to the base station and the distance from each node to its corresponding cluster center are considered as determining factors in CHs selection. Then, genetic algorithm is also used to obtain fuzzy if–then rules of GFS. Consequently, the system performance is improved and appropriate CHs can be selected, hence energy dissipation is reduced. The simulation results show that ECAFG, compared with the existing methods, significantly reduces the energy consumption of the sensor nodes, and prolongs the network lifetime.
An improved fuzzy C-means (FCM) clustering method is proposed. It incorporates Otsu thresholding with conventional FCM to reduce FCM's susceptibility to local minima, as well as its tendency to derive a threshold that is biased towards the component with larger probability, and derive threshold values with greater accuracy. Thresholding is performed at the cluster boundary region in feature space. A comparison of the results produced by improved and conventional algorithms confirms the superior performance of the former.
Choosing the optimal number of clusters is a key issue in cluster analysis. Especially when dealing with more spatial clustering, things tend to be more complicated. Cluster validation helps to determine the appropriate number of clusters present in a dataset. Furthermore, cluster validation evaluates and assesses the results of clustering algorithms. There are numerous methods and techniques for choosing the optimal number of clusters via crisp and fuzzy clustering. In this paper, we introduce a new index for fuzzy clustering to determine the optimal number of clusters. This index is not another metric for calculating compactness or separation among partitions. Instead, the index uses several existing indices to give a degree, or fuzziness, to the optimal number of clusters. In this way, not only do the objects in a fuzzy cluster get a membership value, but the number of clusters to be partitioned is given a value as well. The new index is used in the fuzzy c-means algorithm for the geodemographic segmentation of 285 postal codes.
Diabetic retinopathy (DR) is a complication of diabetes caused by changes in the blood vessels of the retina. Initially, the DR causes trivial changes in the retinal capillary. The symptoms can blur or distort patients' vision, which are the main causes of blindness. The DR is characterized by the presence of exudates at the nonproliferative stage. Once damaged by DR, the effects will be permanent and hence an earlier treatment is considered as vital. The presence of exudates is detected by ophthalmologists from the dilated retinal images, which are captured by dropping chemical solution into the patient's eye that leads to irritation. Therefore, there is a need for an alternative method toward the detection of exudates using image processing algorithms from the nondilated images. In this paper, an automated method is proposed for the detection of exudates using the fuzzy C-Means (FCM) clustering technique and reconstruction through a superimposition process in the absence of dilating patient's eye. The segmented result of FCM is compared with the result obtained using the Fuzzy K-Means segmentation algorithm. The sensitivity and specificity values for the exudates detection using the FCM algorithm are 87.38% and 96.94%, respectively. On the other hand, sensitivity and specificity values for the exudates detection using the K-Means algorithm are 75.04% and 93.73%, respectively.
There are well established methods for fuzzy clustering especially for the cases where the feature values are numerical of ratio or interval scale. Not so well established are methods to be applied when the feature values are ordinal or nominal. In that case there is no one best method it seems. This paper discusses a method where unknown numeric variables are assigned to the ordinal values. Part of minimizing an objective function for the clustering is to find numeric values for these variables. Thus real numbers of interval scale and even ratio scale for that matter are assigned to the original ordinal values. The method uses the same objective function as used in fuzzy c-means clustering but both the membership function and the ordinal to real mapping are determined by gradient descent. Since the ordinal to real mapping is not known it cannot be verified for its legitimacy. However the ordinal to real mapping that is found is best in terms of the clustering produced. Simulations show the method to be quite effective.
In order to analyze the distribution of individual opinions (decision rules) in a group, clustering of decision tables is proposed. An agglomerative hierarchical clustering (AHC) of decision tables has been examined. The result of AHC does not always optimize some criterion. We develop non-hierarchical clustering techniques for decision tables. In order to treat positive and negative evaluations to a common profile, we use a vector of rough membership values to represent individual opinion to a profile. Using rough membership values, we develop a K-means method as well as fuzzy c-means methods for clustering decision tables. We examined the proposed methods in clustering real world decision tables obtained by a questionnaire investigation.
In 1993, Yang first extended the classification maximum likelihood (CML) to a so-called fuzzy CML, by combining fuzzy c-partitions with the CML function. Fuzzy c-partitions are generally an extension of hard c-partitions. It was claimed that this was more robust. However, the fuzzy CML still lacks some robustness as a clustering algorithm, such as its in-ability to detect different volumes of clusters, its heavy dependence on parameter initializations and the necessity to provide an a priori cluster number. In this paper, we construct a robust fuzzy CML clustering framework that has a robust clustering method. The eigenvalue decomposition of a covariance matrix is firstly considered using the fuzzy CML model. The Bayesian information criterion (BIC) is then used for model selection, in order to choose the best model with the optimal number of clusters. Therefore, the proposed robust fuzzy CML clustering framework exhibits clustering characteristics that are robust in terms of the parameter initialization, robust in terms of the cluster number and also in terms of its capability to detect different volumes of clusters. Numerical examples and real data applications with comparisons are provided, which demonstrate the effectiveness and superiority of the proposed method.
A fuzzy classifier using multiple ellipsoids to approximate decision regions for classification is designed in this paper. To learn the sizes and orientations of ellipsoids, an algorithm called evolutionary ellipsoidal classification algorithm (EECA) that integrates the genetic algorithm (GA) with the Gustafson-Kessel algorithm (GKA) is proposed. Within EECA the GA is employed to learn the size of every ellipsoid. With the size of every ellipsoid encoded and intelligently estimated in the GA chromosome, GKA is utilized to learn the corresponding ellipsoid. GKA is able to adapt the distance norm to the underlying distribution of the prototype data points for an assigned ellipsoid size. A process called directed initialization is proposed to improve EECA's learning efficiency. Because EECA learns the data point distribution in every cluster by adjusting an ellipsoid with suitable size and orientation, the information contained in the ellipsoid is further utilized to improve the cluster validity. A cluster validity measure based on the ratio of summation for each intra-cluster scatter with respect to the inter-cluster separation is defined in this paper. The proposed cluster validity measure takes advantage of EECA's learning capability and serves as an effective index for determining the adequate number of ellipsoids required for classification.
Fuzzy C-means (FCM) clustering algorithm is an important and popular clustering algorithm which is utilized in various application domains such as pattern recognition, machine learning, and data mining. Although this algorithm has shown acceptable performance in diverse problems, the current literature does not have studies about how they can improve the clustering quality of partitions with overlapping classes. The better the clustering quality of a partition, the better is the interpretation of the data, which is essential to understand real problems. This work proposes two robust FCM algorithms to prevent ambiguous membership into clusters. For this, we compute two types of weights: an weight to avoid the problem of overlapping clusters; and other weight to enable the algorithm to identify clusters of different shapes. We perform a study with synthetic datasets, where each one contains classes of different shapes and different degrees of overlapping. Moreover, the study considered real application datasets. Our results indicate such weights are effective to reduce the ambiguity of membership assignments thus generating a better data interpretation.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.