Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Effective feature selection methods are important for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus. Extensive research has been done to improve the performance of individual feature selection methods. However, it is always a challenge to come up with an individual feature selection method which would outperform other methods in most cases. In this paper, we explore the possibility of improving the overall performance by combining multiple individual feature selection methods. In particular, we propose a method of combining multiple feature selection methods by using an information fusion paradigm, called Combinatorial Fusion Analysis (CFA). A rank-score function and its associated graph, called rank-score graph, are adopted to measure the diversity of different feature selection methods. Our experimental results demonstrated that a combination of multiple feature selection methods can outperform a single method only if each individual feature selection method has unique scoring behavior and relatively high performance. Moreover, it is shown that the rank-score function and rank-score graph are useful for the selection of a combination of feature selection methods.
Combining multiple information retrieval (IR) systems has been shown to improve performance over individual systems. However, it remains a challenging problem to determine when and how a set of individual systems should to be combined. In this paper, we investigate these issues using combinatorial fusion analysis and five data sets provide by TREC 2, 3, 4, 5, and 6. In particular, we compare the performance of combining six IR systems selected by random choice vs. by performance measurement from these five TREC data sets. Two experiments are conducted, which include: (1) combination of two systems and their performance outcome in terms of performance ratio and cognitive diversity, and (2) combinatorial fusion of t-systems, t = 2 to 6, using both score and rank combinations and exploration of the effect of diversity on the performance outcome. It is demonstrated in both experiments that combination of two or more systems improves the performance more significantly when the systems are selected by performance evaluation than those selected by random choice. Our work provides a distinctive method of system selection for the combination of multiple retrieval systems.