Support Vector Machine Based Hierarchical Classifiers for Large Class Problems
One of the prime challenges in designing a classifier for large-class problems such as Indian language OCRs is the presence of a large similar looking character set. The nature of the character set introduces problems with accuracy and efficiency of the classifier. Hierarchical classifiers such as Binary Hierarchical Decision Trees (BHDTs) using SVMs as component classifiers have been effectively used to tackle such large-class classification problems. The accuracy and efficiency of a BHDT classifier will depend on: i) the accuracy of the component classifiers, ii) the separability of the clusters at each node in a hierarchical classifier, and iii) the balance of the BHDT. We propose methods to tackle each of the above problems in the case of binary character images. We present a new distance measure, which is intuitively suitable when Support Vector Machines are used as component classifiers. We also propose a novel method for balancing the BHDT to improve its efficiency, while maintaining the accuracy. Finally we propose a method to generate overlapping partitions to improve the accuracy of BHDTs. Comparison of the method with other forms of classifier combination techniques such as 1vs1, 1vsRest and Decision Directed Acyclic Graphs shows that the proposed approach is highly efficient, while being comparable with the more expensive techniques in terms of accuracy. The experiments are focused on the problem of Indian language OCR, while the framework is usable for other problems as well.