This paper addresses the critical and complex task of integrating music education information resources, highlighting the scattered nature and inefficient utilization of current resources in the field. The significant challenge of effectively consolidating diverse types of music education resources, such as digital audio, scores, and instructional videos, is addressed. We propose a novel algorithm, rooted in data mining techniques, specifically designed for the rapid integration of these resources. Our method involves a systematic approach that begins with standardizing the format of the input data, including audio lengths and image dimensions, to ensure uniformity. We then employ Convolutional Neural Network technology to extract features from audio, images, and videos, harnessing the power of deep learning to handle the multi-modality of the data. The extracted features from these varied sources are integrated into a unified format for subsequent processing. Following the feature extraction and integration, we utilize spectral clustering to categorize the music education resources. This clustering method is particularly effective in dealing with the complexities and nuances of the multi-modal data. Our experimental results demonstrate the efficacy of our algorithm in accurately classifying and integrating diverse music education resources, offering a promising solution to the challenges currently faced in the field.
Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.
Link prediction in social networks has become a growing concern among researchers. In this paper, the clustering method was used to exploit the grouping tendency of nodes, and a clustering index (CI) was proposed to predict potential links with characteristics of scientific cooperation network taken into consideration. Results showed that CI performed better than the traditional indices for scientific coauthorship networks by compensating for their disadvantages. Compared with traditional algorithms, this method for a specific type of network can better reflect the features of the network and achieve more accurate predictions.
Spectral clustering is one of the most important data processing methods which has been wildly applied to machine learning, computer vision, pattern recognition and image processing. However, one of the main drawbacks of spectral clustering is the fact that the clustering model is defined only for primal data without clear extension to out-of-sample data. To improve its efficiency, in this paper, we proposed a new modularity-based method for spectral clustering with out-of-sample extension. First, kernel independent component analysis is used to solve the demixing matrix on Stiefel manifold in order to extract high-order irrelevant data feature. Then, a new modularity similarity measure-based spectral mapping algorithm is proposed, which allows the clustering model to be directly extended to out-of-sample data. Based on above analysis, we present a spectral clustering algorithm with out-of-sample extension. Experimental results show our method has better performance compared with other related algorithms in different datasets.
We present a model construction method based on a local fitting of polynomial functions to noisy data and building the entire model as a union of regions explained by such polynomial functions. Local fitting is shown to reduce to solving a polynomial eigenvalue problem where the matrix coefficients are data covariance and approximated noise covariance matrices that capture distortion effects by noise. By defining the asymmetric distance between two points as the projection of one onto the function fitted to the neighborhood of the other, we use a best weighted cut method to find a proper partitioning of the entire set of data into feasible regions. Finally, the partitions are refined using a modified version of a k-planes algorithm.
Clustering is a popular and effective method for image segmentation. However, existing cluster methods often suffer the following problems: (1) Need a huge space and a lot of computation when the input data are large. (2) Need to assign some parameters (e.g. number of clusters) in advance which will affect the clustering results greatly. To save the space and computation, reduce the sensitivity of the parameters, and improve the effectiveness and efficiency of the clustering algorithms, we construct a new clustering algorithm for image segmentation. The new algorithm consists of two phases: coarsening clustering and exact clustering. First, we use Affinity Propagation (AP) algorithm for coarsening. Specifically, in order to save the space and computational cost, we only compute the similarity between each point and its t nearest neighbors, and get a condensed similarity matrix (with only t columns, where t << N and N is the number of data points). Second, to further improve the efficiency and effectiveness of the proposed algorithm, the Self-tuning Spectral Clustering (SSC) is used to the resulted points (the representative points gotten in the first phase) to do the exact clustering. As a result, the proposed algorithm can quickly and precisely realize the clustering for texture image segmentation. The experimental results show that the proposed algorithm is more efficient than the compared algorithms FCM, K-means and SOM.
Spectral clustering partitions data into similar groups in the eigenspace of the affinity matrix. The accuracy of the spectral clustering algorithm is affected by the affine equivariance realized in the translation of distance to similarity relationship. The similarity value computed as a Gaussian of the distance between data objects is sensitive to the scale factor σ. The value of σ, a control parameter of drop in affinity value, is generally a fixed constant or determined by manual tuning. In this research work, σ is determined automatically from the distance values i.e. the similarity relationship that exists in the real data space. The affinity value of a data pair is determined as a location estimate of the spread of distance values of the data points with the other points. The scale factor σi corresponding to a data point xi is computed as the trimean of its distance vector and used in fixing the scale to compute the affinity matrix. Our proposed automatic scale parameter for spectral clustering resulted in a robust similarity matrix which is affine equivariant with the distance distribution and also eliminates the overhead of manual tuning to find the best σ value. The performance of spectral clustering using such affinity matrices was analyzed using UCI data sets and image databases. The obtained scores for NMI, ARI, Purity and F-score were observed to be equivalent to those of existing works and better for most of the data sets. The proposed scale factor was used in various state-of-the-art spectral clustering algorithms and it proves to perform well irrespective of the normalization operations applied in the algorithms. A comparison of clustering error rates obtained for various data sets across the algorithms shows that the proposed automatic scale factor is successful in clustering the data sets equivalent to that obtained using manually tuned best σ value. Thus the automatic scale factor proposed in this research work eliminates the need for exhaustive grid search for the best scale parameter that results in best clustering performance.
Spectral clustering has become very popular in recent years, due to the simplicity of its implementation as well as the performance of the method, in comparison with other popular ones. But many studies show that clustering results are sensitive to the selection of the similarity graph and its parameters, e.g. σ and k. To address this issue, inspired by density sensitive similarity measure, we propose an improved spectral graph clustering method that utilizes the similarity measure based on data density combined with DNA genetic algorithms (ISC-DNA-GA), making it increase the distance of the pairs of data in the high density areas, which are located in different spaces. The method can reduce the similarity degree among the pairs of data in the same density region to find the spatial distribution characteristics of the complex data. After computing the Laplacian matrix, we apply DNA-GAs to obtain the clustering centroids and assign all of the points to the centroids, so as to achieve better clustering results. Experiments have been conducted on the artificial and real-world datasets with various multi-dimensions, using evaluation methods based on external clustering criteria. The results show that the proposed method improves the spectral clustering quality, and it is superior to those competing approaches.
In this paper, two novel approaches for unsupervised feature selection are proposed based on the spectral clustering. In the first proposed method, spectral clustering is employed over the features and the center of clusters is selected as well as their nearest-neighbors. These features have a minimum similarity (redundancy) between themselves since they belong to different clusters. Next, samples of data sets are clustered employing spectral clustering so that to the samples of each cluster a specific pseudo-label is assigned. After that according to the obtained pseudo-labels, the information gain of the features is computed that secures the maximum relevancy. Finally, the intersection of the selected features in the two previous steps is determined that simultaneously guarantees both the maximum relevancy and minimum redundancy. Our second proposed approach is very similar to the first one whose only but significant difference with the first method is that it selects one feature from each cluster and sorts all the features in terms of their relevancy. Then, by appending the selected features to a sorted list and ignoring them for the next step, the algorithm continues with the remaining features until all the features to be appended into the sorted list. Both of our proposed methods are compared with state-of-the-art methods and the obtained results confirm the performance of our proposed approaches especially the second one.
In recent times, graph based spectral clustering algorithms have received immense attention in many areas like, data mining, object recognition, image analysis and processing. The commonly used similarity measure in the clustering algorithms is the Gaussian kernel function which uses sensitive scaling parameter and when applied to the segmentation of noise contaminated images leads to unsatisfactory performance because of neglecting the spatial pixel information. The present work introduces a novel framework for spectral clustering which embodied local spatial information and fuzzy based similarity measure to tackle the above mentioned issues. In our approach, firstly we filter the noise components from original image by using the spatial and gray–level information. The similarity matrix is then constructed by employing a similarity measure which takes into account the fuzzy c-partition matrix and vectors of the cluster centers obtained by fuzzy c-means clustering algorithm. In the last step, spectral clustering technique is realized on derived similarity matrix to obtain the desired segmentation result. Experimental results on segmentation of synthetic and Berkeley benchmark images with noise demonstrates the effectiveness and robustness of the proposed method, giving it an edge over the clustering based segmentation method reported in the literature.
As the traditional spectral community detection method uses adjacency matrix for clustering which might cause the problem of accuracy reduction, we proposed a signal-diffusion-based spectral clustering for community detection. This method solves the problem that unfixed total signal as using the signal transmission mechanism, provides optimization of algorithm time complexity, improves the performance of spectral clustering with construction of Laplacian based on signal diffusion. Experiments prove that the method reaches as better performance on real-world network and Lancichinetti–Fortunato–Radicchi (LFR) benchmark.
Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.
Deep learning technologies are permeating every field from image and speech recognition to computational and systems biology. However, the application of convolutional neural networks (CCNs) to “omics” data poses some difficulties, such as the processing of complex networks structures as well as its integration with transcriptome data. Here, we propose a CNN approach that combines spectral clustering information processing to classify lung cancer. The developed spectral-convolutional neural network based method achieves success in integrating protein interaction network data and gene expression profiles to classify lung cancer. The performed computational experiments suggest that in terms of accuracy the predictive performance of our proposed method was better than those of other machine learning methods such as SVM or Random Forest. Moreover, the computational results also indicate that the underlying protein network structure assists to enhance the predictions. Data and CNN code can be downloaded from the link:https://sites.google.com/site/nacherlab/analysis
Unsupervised spectral clustering methods can yield good performance when identifying crisp clusters with low complexity since the learning algorithm does not rely on finding the local minima of an objective function and rather uses spectral properties of the graph. Nonetheless, the performance of such approaches are usually affected by their uncertain parameters. Using the underlying structure of a general spectral clustering method, in this paper a new soft-link spectral clustering algorithm is introduced to identify clusters based on fuzzy k-nearest neighbor approach. We construct a soft weight matrix of a graph by identifying the upper and lower boundaries of learning parameters of the similarity function, specifically the fuzzifier parameter (fuzziness) of the Fuzzy k-Nearest Neighbor algorithm. The algorithm allows perturbations on the graph Laplace during the learning stage by the changes on such learning parameters. With the empirical analysis using an artificial and a real textual entailment dataset, we demonstrate that our initial hypothesis of implementing soft links for spectral clustering can improve the classification performance of final outcome.
Annotating genes is a fundamental issue in the post-genomic era. A typical procedure for this issue is first clustering genes by their features and then assigning functions of unknown genes by using known genes in the same cluster. A lot of genomic information are available for this issue, but two major types of data which can be measured for any gene are microarray expressions and sequences, both of which however have their own flaws. Thus a natural and promising approach for gene annotation is to integrate these two data sources, especially in terms of their costs to be optimized in clustering. We develop an efficient gene annotation method with three steps containing spectral clustering over the integrated cost, based on the idea of network modularity. We rigorously examined the performance of our proposed method from three different viewpoints. All experimental results indicate the performance advantage of our method over possible clustering/classification-based approaches of gene function annotation, using expressions and/or sequences.
Clustering analysis is a significant technique of data mining. With the rise of lifelong learning, lifelong clustering has become a research topic. Lifelong clustering builds libraries shared among multiple tasks, and these tasks achieve effective information transmission by interacting with the shared knowledge libraries. However, selecting optimal hyper-parameters in the knowledge transfer process often employs the actual clustering division in the dataset as a reference, which is unavailable during the clustering process. Moreover, the hyper-parameters for each task are typically set to constant values because of computational difficulty. Therefore, this paper explores a clustering method based on Bayesian inference, where the parameter setting is priori information, and the clustering divisions obtained by the parameters are posteriori information. In our method, hyper-parameters corresponding to the maximum a posteriori (MAP) probability are selected in each task. Then, we apply this method to Lifelong Spectral Clustering to select hyper-parameters and propose a new algorithm, called Maximum a Posteriori Lifelong Spectral Clustering (MAPLSC). Finally, experiments on several real-world datasets show the effectiveness of our method and the average clustering performance of Lifelong Spectral Clustering is improved.
Multi-view clustering has been attracting the attention of researchers in recent years and is one of the popular machine learning and unsupervised learning techniques. In conventional multi-view clustering, it is challenging to handle multi-view clustering containing missing views, called incomplete multi-view clustering. To address this problem, we propose a novel Graph Learning for Incomplete Multi-view Spectral Clustering (namely GIMSC) algorithm to perform incomplete multi-view clustering tasks. GIMSC can simultaneously integrate individual graph learning, fusion graph learning and spectral clustering into a unified framework, which is able to learn the consensus representation shared by all views via incomplete graphs construction. GIMSC learns the adaptive local structure for all views pre-constructed by k-nearest neighbor. Then, we construct the fusion graph with auto-weighted learning to explore the consensus similarity matrix for incomplete graphs with different sizes, which will reduce the negative influence of outliers. We introduce an index matrix to achieve the transformation among incomplete and complete graphs with respect to each view. An iterative optimization algorithm is proposed to solve the optimization procedure. In experiments, we extensively conduct our method on four incomplete multi-view datasets, showing the proposed method outperforms the existing state-of-the-art methods.
This paper proposes a car key tooth code automatic identification method based on the spectral clustering method to tackle the problem of the current low efficiency and high costs of artificial and mechanical identification methods of the car key tooth code. By using the general auto key fixture, the tooth height of different keys is transformed into to an easily identifiable single detection chip height, which is associated with the contour of the car key image. The car key tooth code is obtainedby calculating the vertical coordinate of the extreme points of the contour line in the car key image, which are processed in accordance with the following method. Firstly, the key tooth edge is obtained with the spectral clustering classification method and the extreme points at the edge of the toothare determined with the minimum and maximum method. Secondly, the relative heights of the extreme points to the key's horizontal edge are determined and the corresponding key tooth code is obtained. Lastly the effectiveness of the method proposed in this paper is verified by a car key image.
With the diversity of data sources, multi-view clustering algorithms are widely used. The traditional research routinely assumes that the multi-view data is complete, but the existing data may actually be missing. So the incomplete view clustering has become a hot research topic. In this paper, a multi-kernel spectral clustering algorithm based on incomplete views (IVMKSpec) is put forward. Firstly, the incomplete datasets are constructed with 10% to 90% of the loss rate, where they are clustered with the estimation of kernel and spectral clustering, and then the clustering results are evaluated by NMI and F-measure. Multi-kernel learning overcomes the defect that single kernel can effectively not handle data of heterogeneous and multiple data sources. Moreover, the multi-kernel spectral clustering is applied to incomplete datasets which improves the performance of incomplete clustering. Finally, the experimental results demonstrate that the proposed algorithm is robust and effective in most datasets.
Gene regulatory networks (GRNs) learned from high throughput genomic data are often hard to visualize due to the large number of nodes and edges involved, rendering them difficult to appreciate. This becomes an important issue when modular structures are inherent in the inferred networks, such as in the recently proposed context-specific GRNs.12 In this study, we investigate the application of graph clustering techniques to discern modularity in such highly complex graphs, focusing on context-specific GRNs. Identified modules are then associated with a subset of samples and the key pathways enriched in the module. Specifically, we study the use of Markov clustering and spectral clustering on cancer datasets to yield evidence on the possible association amongst different tumor types. Two sets of gene expression profiling data were analyzed to reveal context-specificity as well as modularity in genomic regulations.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.