INFORMATION DISTANCE AND ITS APPLICATIONS
Abstract
We have been developing a general theory of information distance and a paradigm of applying this theory to practical problems.[3, 19, 20] There are several problems associated with this theory. On the practical side, among other problems, the strict requirement of triangle inequality is unrealistic in some applications; on the theoretical side, the universality theorems for normalized information distances were only proved in a weak form. In this paper, we will introduce a complementary theory that resolves or avoids these problems.
This article also serves as a brief expository summary for this area. We will tell the stories about how and why some of the concepts were introduced, recent theoretical developments and interesting applications. These applications include whole genome phylogeny, plagiarism detection, document comparison, music classification, language classification, fetal heart rate tracing, question answering, and a wide range of other data mining tasks.
Some preliminary results and writings of this paper appeared in CIAA'06 and KDD'07.