Please login to be able to save your searches and receive alerts for new content matching your search criteria.
There are several commonly accepted clustering quality measures (clustering quality as opposed to cluster quality) such as the rand index, the adjusted rand index and the jacquard index. Each of these however is based on comparing the partition produced by the clustering process to a correct partition. They can therefore only be used to determine the quality of a clustering process when the correct partition is known. This paper therefore proposes another clustering quality measure that does not require the comparison to a correct partition.
The proposed metric is based on the assumption that the proximities between the membership vectors should correlate positively with the proximities between the objects which may be the proximities between their feature vectors. The values of the components of the membership vector, corresponding to a pattern, are the membership degrees of the pattern in the various clusters. The membership vector is just another object data vector or type of feature vector with the feature values for an object being the membership values of the object in the various clusters. Based on this premise, this paper describes some new cluster quality metrics derived from standard correlation measures and other proposed correlation metrics.
Simulations on data with a wide range of clusterability or separability show that the approach of comparing the proximity matrix based on the membership matrix to the object proximity matrix is quite effective.
There are well established methods for fuzzy clustering especially for the cases where the feature values are numerical of ratio or interval scale. Not so well established are methods to be applied when the feature values are ordinal or nominal. In that case there is no one best method it seems. This paper discusses a method where unknown numeric variables are assigned to the ordinal values. Part of minimizing an objective function for the clustering is to find numeric values for these variables. Thus real numbers of interval scale and even ratio scale for that matter are assigned to the original ordinal values. The method uses the same objective function as used in fuzzy c-means clustering but both the membership function and the ordinal to real mapping are determined by gradient descent. Since the ordinal to real mapping is not known it cannot be verified for its legitimacy. However the ordinal to real mapping that is found is best in terms of the clustering produced. Simulations show the method to be quite effective.