OPTIMIZATION OF TRAINING TEXTS FOR WRITER-DEPENDENT HANDWRITING RECOGNITION
We address the problem of determining the best training text for large-vocabulary, writer-dependent, unconstrained English handwriting recognition. Our goal is to achieve maximum recognition accuracy, while minimizing the duration and tedium of the user's task of writing training text. We explore recognition accuracy as a function of three dimensions of training text: length, choice of character-coverage criterion, and relative priority of keeping the text interesting vs. optimizing to the chosen character-coverage criterion. Our results show various advantages to using coverage criteria based on (1) balancing occurrences of character unigrams and (2) incorporating most-common bigrams. We also find that preserving a theme in the training text causes relatively little harm to coverage or recognition accuracy.