In geometry group theory, one of the milestones is Gromov’s polynomial growth theorem: Finitely generated groups have polynomial growth if and only if they are virtually nilpotent. Inspired by Gromov’s work, we introduce the growth types of weighted Hardy spaces. In this paper, we focus on the weighted Hardy spaces of polynomial growth, which cover the classical Hardy space, weighted Bergman spaces, weighted Dirichlet spaces and much broader. Our main results are as follows. (1) We obtain the boundedness of the composition operators with symbols of analytic automorphisms of unit open disk acting on weighted Hardy spaces of polynomial growth, which implies the multiplication operator Mz is similar to Mφ for any analytic automorphism φ on the unit open disk. Moreover, we obtain the boundedness of composition operators induced by analytic functions on the unit closed disk on weighted Hardy spaces of polynomial growth. (2) For any Blaschke product B of order m, MB is similar to ⊕m1Mz, which is an affirmative answer to a generalized version of a question proposed by Douglas in 2007. (3) We also give counterexamples to show that the composition operators with symbols of analytic automorphisms of unit open disk acting on a weighted Hardy space of intermediate growth could be unbounded, which indicates the necessity of the setting of polynomial growth condition. Then, the collection of weighted Hardy spaces of polynomial growth is almost the largest class such that Douglas’s question has an affirmative answer. (4) Finally, we give the Jordan representation theorem and similarity classification for the analytic functions on the unit closed disk as multiplication operators on a weighted Hardy space of polynomial growth.
The paper presents results on factorization by similarity of fuzzy concept lattices with hedges. A fuzzy concept lattice is a hierarchically ordered collection of clusters extracted from tabular data. The basic idea of factorization by similarity is to have, instead of a possibly large original fuzzy concept lattice, its factor lattice. The factor lattice contains less clusters than the original concept lattice but, at the same time, represents a reasonable approximation of the original concept lattice and provides us with a granular view on the original concept lattice. The factor lattice results by factorization of the original fuzzy concept lattice by a similarity relation. The similarity relation is specified by a user by means of a single parameter, called a similarity threshold. Smaller similarity thresholds lead to smaller factor lattices, i.e. to more comprehensible but less accurate approximations of the original concept lattice. Therefore, factorization by similarity provides a trade-off between comprehensibility and precision.
We first describe the notion of factorization. Second, we present a way to compute the factor lattice directly from input data, i.e. without the need to compute the possibly large original concept lattice. Third, we provide an illustrative example to demonstrate our method.
Individuals that share common properties self-organize into communities. Therefore, community analysis is an important way to ascertain whether or not a complex system consists of sub-structures with different properties. In this paper, we give a two-level community structure analysis for the SSCI journal system by most similar node pairs. Five different strategies for the selection of node pairs are introduced. The efficiency is checked by normalized mutual information (NMI) technique. Statistical properties and comparisons of the community results show that both of the two level detections could give instructional information for the community structure in complex systems. Further comparisons of the five strategies show more indications. Firstly, it is always efficient to assign individuals with maximum similarity into the same community whether the interaction information is complete or not; secondly, rational random selection plays an important role in community diversity, and it is not a good idea to keep two much information for rational random selection; finally, random selection generates small-world community structure with no inside order. These results give valuable indication for efficient and stable community detection for huge complex systems with big interaction data.
Uncertain system clustering is an important issue which is the base for mining real-world data in many fields. In this paper, we investigate the clustering problem of uncertain system, and propose an improved clustering algorithm. Here, our improved algorithm considers not only distance but also spatial direction in vector space of data points. The aim is to improve the accuracy of uncertain system clustering, especially when there exists overlap among border region of groups in vector space. Experiment results show that for uncertain physics systems, the improved algorithm can well increase the accuracy of uncertain system clustering compared with the traditional clustering method which is based on distance similarity. Maximum value of accuracy and f-measure are increased by 21.1% and 13.3%, respectively. Moreover, the proposed algorithm has high robustness for noise.
Label propagation algorithm is widely used for community detection in a network due to its linear time complexity. It also does not need any predefined information such as the number of communities. However, the results of this algorithm are not stable because of the randomness strategy used in its propagation process. In this paper, a modification on label propagation strategy is proposed in which labels are propagated based on nodes importance defined by their positions and popularity among neighbors. The proposed strategy is an updating process which reduces the instability of the label propagation algorithm. Experiments on real-world and synthetic networks show that the proposed method improves accuracy in terms of modularity, NMI and ARI. Also, the method has an acceptable execution time.
The compression of networks is an important aspect of complex networks and spatial generalization. Previous studies show that the dual graph for street–street relationships more accurately reflects the morphological features of street networks than the traditional methods. In this study, a dual graph for street–street relationship is constructed based on complex networks theory. We introduce the concept of m-order neighbors and take into account the factors of the node’s degree, closeness centrality, betweenness centrality, and distance within the dual graph. We also consider the importance contributions of the node itself and its 1- to m-order neighbors and define the evaluation model of node importance. We then propose a street compression process based on the evaluation of node importance for dual graph by considering the structural and functional features of streets. The degree distribution and topological similarity index are introduced to evaluate the level of maintaining the global structure and topological characteristics of the road network and to validate the efficiency of the proposed method. A real urban road network is used for the experiments. Results show that the proposed approach can be used in selecting important streets that can retain the global structural properties and topological connectivity of the street network.
In this paper, we present an analysis of oracle bone characters for animals from a “cognitive” point of view. After some general remarks on oracle-bone characters presented in Sec. 1 and a short outline of the paper in Sec. 2, we collect various oracle-bone characters for animals from published resources in Sec. 3. In the next section, we begin analyzing a group of 60 ancient animal characters from www.zdic.net, a highly acclaimed internet dictionary of Chinese characters that is strictly based on historical sources, and introduce five categories of specific features regarding their (graphical) structure that will be used in Sec. 5 to associate corresponding feature vectors to these characters. In Sec. 6, these feature vectors will be used to investigate their dissimilarity in terms of a family of parameterized distance measures. And in the last section, we apply the SplitsTree method as encoded in the NeighborNet algorithms to construct a corresponding family of dissimilarity-based networks with the intention of elucidating how the ancient Chinese might have perceived the “animal world” in the late bronze age and to demonstrate that these pictographs reflect an intuitive understanding of this world and its inherent structure that predates its classification in the oldest surviving Chinese encyclopedia from approximately the third century BC, the Er Ya, as well as similar classification systems in the West by one to two millennia. We also present an English dictionary of 70 oracle bone characters for animals in Appendix A. In Appendix B, we list various variants of animal characters that were published in the Jia Gu Wen Bian (cf. 甲骨文编, A Complete Collection of Oracle Bone Characters, edited by the Institute of Archaeology of the Chinese Academy of Social Sciences, published by the Zhonghua Book Company in 1965). We recall the frequencies of the 521 most frequent oracle bone characters in Appendix C as reported in [T. Chen, Yin-Shang Jiaguwen Zixing Xitong Zai Yanjiu, (The Structural System of Oracle Inscriptions) (Shanghai Renmin Chubanshe, Shanghai, 2010); Jiaguwen Shiwen Yongzi Pinlü Biao (A Frequency List of Oracle Characters), Center for the Study and Application of Chinese Characters (East China Normal University, Shanghai, 2010), http://www.wenzi.cn/en/default.aspx. And in Appendix D, we list the animals registered in the last five chapters of the Er Ya.
Community detection based on label propagation algorithm (LPA) has attracted widespread concern because of its high efficiency. But it is difficult to guarantee the accuracy of community detection as the label spreading is random in the algorithm. In response to the problem, an improved LPA based on random walk (RWLPA) is proposed in this paper. Firstly, a matrix measuring similarity among various nodes in the network is obtained through calculation. Secondly, during the process of label propagation, when a node has more than a neighbor label with the highest frequency, not the label of a random neighbor but the label of the neighbor with the highest similarity will be chosen to update. It can avoid label propagating randomly among communities. Finally, we test LPA and the improved LPA in benchmark networks and real-world networks. The results show that the quality of communities discovered by the improved algorithm is improved compared with the traditional algorithm.
For the limitation that current node influence ranking algorithms can only be applied in a single type of network and the results are inaccurate, an algorithm based on similarity is proposed. When a node is similar to many nodes in the network, it is representative and can be treated as an influential node. Firstly, probability walking model is used to simulate the initiative visit between nodes in different types of networks. Secondly, superposed probabilistic transfer similarity is defined based on the model considering nodes’ inbound and outbound information. Finally, node ranking algorithm is set up using the new similarity measuring method. Experiments show that the algorithm can evaluate different kinds of networks with high accuracy, whether the network is directed or undirected, weighted or unweighted.
Detecting the natural communities in a real-world network can uncover its underlying structure and potential function. In this paper, a novel community algorithm SUM is introduced. The fundamental idea of SUM is that a node with relatively low degree stays faithful to its community, because it only has links with nodes in one community, while a node with relatively high degree not only has links with nodes within but also outside its community, and this may cause confusion when detecting communities. Based on this idea, SUM detects communities by suspecting the links of the maximum degree nodes to their neighbors within a community, and relying mainly on the nodes with relatively low degree simultaneously. SUM elegantly defines a similarity which takes into account both the commonality and the rejective degree of two adjacent nodes. After putting similar nodes into one community, SUM generates initial communities by reassigning the maximum degree nodes. Next, SUM assigns nodes without labels to the initial communities, and adjusts the border node to its most linked community. To evaluate the effectiveness of SUM, SUM is compared with seven baselines, including four classical and three state-of-the-art methods on a wide range of complex networks. On the small size networks with ground-truth community structures, results are visually demonstrated, as well as quantitatively measured with ARI, NMI and Modularity. On the relatively large size networks without ground-truth community structures, the performances of these algorithms are evaluated according to Modularity. Experimental results indicate that SUM can effectively determine community structures on small or relatively large size networks with high quality, and also outperforms the compared state-of-the-art methods.
Identifying communities in complex networks has recently attracted considerable attention in different fields. The goal of community identification is to cluster vertices of a network into groups, which is the same as clustering in machine learning and data mining domains. A recent proposed clustering method called affinity propagation shows high performance in clustering data sets into groups, and it does not require that the number of clusters be pre-specified. In this paper, based on a new method for calculating similarity between pairs of vertices and a transforming method for a given similarity from likelihood to log-domain, we apply that affinity propagation clustering method to identify communities in complex networks. Extensive simulation results demonstrate that affinity propagation clustering algorithm is very effective for identifying community structures in both computer-generated and real-world network data.
Community detection is important for understanding the structure and function of networks. Resistance distance is a kind of distance function inherent in the network itself, which has important applications in many fields. In this paper, we propose a novel community detection algorithm based on resistance distance and similarity. First, we propose the node similarity, which is based on the common nodes and resistance distance. Then, we define the distance function between nodes by similarity. Furthermore, we calculate the distance between communities by using the distance between nodes. Finally, we detect the community structure in the network according to the nearest-neighbor nodes being in the same community. Experimental results on artificial networks and real-world networks show that the proposed algorithm can effectively detect the community structures in complex networks.
Decision-tree algorithms are known to be unstable: small variations in the training set can result in different trees and different predictions for the same validation examples. Both accuracy and stability can be improved by learning multiple models from bootstrap samples of training data, but the "meta-learner" approach makes the extracted knowledge hardly interpretable. In the following paper, we present the Info-Fuzzy Network (IFN), a novel information-theoretic method for building stable and comprehensible decision-tree models. The stability of the IFN algorithm is ensured by restricting the tree structure to using the same feature for all nodes of the same tree level and by the built-in statistical significance tests. The IFN method is shown empirically to produce more compact and stable models than the "meta-learner" techniques, while preserving a reasonable level of predictive accuracy.
Improving naive Bayes (simply NB)15,28 for classification has received significant attention. Related work can be broadly divided into two approaches: eager learning and lazy learning.1 Different from eager learning, the key idea for extending naive Bayes using lazy learning is to learn an improved naive Bayes for each test instance. In recent years, several lazy extensions of naive Bayes have been proposed. For example, LBR,30 SNNB,27 and LWNB.8 All these algorithms aim to improve naive Bayes' classification performance. Indeed, they achieve significant improvement in terms of classification, measured by accuracy. In many real-world data mining applications, however, an accurate ranking is more desirable than an accurate classification. Thus a natural question is whether they also achieve significant improvement in terms of ranking, measured by AUC (the area under the ROC curve).2,11,17 Responding to this question, we conduct experiments on the 36 UCI data sets18 selected by Weka12 to investigate their ranking performance and find that they do not significantly improve the ranking performance of naive Bayes. Aiming at scaling up naive Bayes' ranking performance, we present a novel lazy method ICNB (instance cloned naive Bayes) and develop three ICNB algorithms using different instance cloning strategies. We empirically compare them with naive Bayes. The experimental results show that our algorithms achieve significant improvement in terms of AUC. Our research provides a simple but effective method for the applications where an accurate ranking is desirable.
Similarity measurement is crucial for classification. Based on the manifold assumption, many graph-based algorithms were developed. Almost all methods follow the k-rule or ε-rule to construct a graph, and then focus on the algorithms based on the graph. However, the graph may not represent the local structure well, and it does not fully utilize the label information yet. The local structure can be presented by the local density and the distance between the samples and their neighbors. And the graph constructed by the guidance of label information will be better approximate of the relationship of the input data. In this paper, we propose an adaptive semi-supervised graph constructing method. The similarity is learned when constructing the graph. The advantages of the similarity learned by our method include: (1) The similarity is measured along the manifold by constructing a graph; (2) nearby points and points in the same cluster share high similarity; (3) samples from the same class have higher similarity than samples from different classes. Experimental results show that using the proposed similarity for classification task could get better recognition accuracy.
In many real-world applications, it is often the case that the class distribution of instances is imbalanced and the costs of misclassification are different. Thus, the class-imbalanced cost-sensitive learning has attracted much attention from researchers. Sampling is one of the widely used techniques in dealing with the class-imbalance problem, which alters the class distribution of instances so that the minority class is well represented in the training data. In this paper, we propose a novel Minority Cloning Technique (MCT) for class-imbalanced cost-sensitive learning. MCT alters the class distribution of training data by cloning each minority class instance according to the similarity between it and the mode of the minority class. The experimental results on a large number of UCI datasets show that MCT performs much better than Minority Oversampling with Replacement Technique (MORT) and Synthetic Minority Oversampling TEchnique (SMOTE) in terms of the total misclassification costs of the built classifiers.
With the application of community detection in complex networks becoming more and more extensive, the application of more and more algorithms for community detection are proposed and improved. Among these algorithms, the label propagation algorithm is simple, easy to perform and its time complexity is linear, but it has a strong randomness. Small communities in the label propagation process are easy to be swallowed. Therefore, this paper proposes a method to improve the partition results of label propagation algorithm based on the pre-partition by circularly searching core nodes and assigning label for nodes according to similarity of nodes. First, the degree of each node of the network is calculated. We go through the whole network to find the nodes with the maximal degrees in the neighbors as the core nodes. Next, we assign the core nodes’ labels to their neighbors according to the similarity between them, which can reduce the randomness of the label propagation algorithm. Then, we arrange the nodes whose labels had not been changed as the new network and find the new core nodes. After that, we update the labels of neighbor nodes according to the similarity between them again until the end of the iteration, to complete the pre-partition. The approach of circularly searching for core nodes increases the diversity of the network partition and prevents the smaller potential communities being swallowed in the process of partition. Then, we implement the label propagation algorithm on the whole network after the pre-partition. Finally, we adopt a modified method based on the degree of membership determined by the bidirectional attraction of nodes and their neighbor communities. This method can reduce the possibility of the error in partition of few nodes. Experiments on artificial and real networks show that the proposed algorithm can accurately divide the network and get higher degree of modularity compared with five existing algorithms.
The manifold-based learning methods have recently drawn more and more attention in dimension reduction. In this paper, a novel manifold-based learning method named enhanced parameter-free diversity discriminant preserving projections (EPFDDPP) is presented, which effectively avoids the neighborhood parameter selection and characterizes the manifold structure well. EPFDDPP redefines the weighted matrices, the discriminating similarity matrix and the discriminating diversity matrix, respectively. The weighted matrices are computed by the cosine angle distance between two data points and take special consideration of both the local information and the class label information, which are parameterless and favorable for face recognition. After characterizing the discriminating similarity scatter matrix and the discriminating diversity scatter matrix, the novel feature extraction criterion is derived based on maximum margin criterion. Experimental results on the Wine data set, Olivetti Research Laboratory (ORL); AR (face database created by Aleix Martinez and Robert Benavente); and Pose, Illumination, and Expression (PIE) face databases show the effectiveness of the proposed method.
Heterogeneous networks are ubiquitous. People like to discover rare but meaningful objects and patterns from such networks. Regardless of high structure similarity or high content similarity, the corresponding objects can be used in data analysis. However, the vast differences between structure and contents should be paid more attention. In this paper, we propose an outlier correlation detection method, called Structure2Content, which discovers outlier correlation incrementally in structure-level and content-level. Structure2Content addresses three important challenges: (1) how can we measure the target object’s structure and content similarity? (2) how can we find the representative features of target objects? (3) how can we insert new data or delete the obsoleted data incrementally. To tackle these challenges, Structure2Content applies four main techniques: (1) two matrices are used to store structure and content similarity, respectively, (2) 3-tuples are used to represent the closeness degree between objects, (3) a mirror step and an iterative process are combined to obtain the top-K outlier correlations, and (4) only updating 3-tuples can help insert or delete data incrementally instead of training all data from the beginning. Substantial experiments show that our proposed method is very effective for outlier correlations detection.
In the world of the Internet of Things (IoT), heterogeneous systems and devices need to be connected and exchange data with others. How data exchange can be automatically realized becomes a critical issue. An information model (IM) is frequently adopted and utilized to solve the data interoperability problem. Meanwhile, as IoT systems and devices can have different IMs with different modeling methodologies and formats such as UML, IEC 61360, etc., automated data interoperability based on various IMs is recognized as an urgent problem. In this paper, we propose an approach to automate the data interoperability, i.e. data exchange among similar entities in different IMs. First, similarity scores among entities are calculated based on their syntactic and semantic features. Then, in order to precisely get similar candidates to exchange data, a concept of class distance calculated with a Virtual Distance Graph (VDG) is proposed to narrow down obtained similar properties for data exchange. Through analyzing the results of a case study, the class distance based on VDG can effectively improve the precisions of calculated similar properties. Furthermore, data exchange rules can be generated automatically. The results reveal that the approach of this research can efficiently contribute to resolving the data interoperability problem.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.