OFFLINE HANDWRITING RECOGNITION USING SYNTHETIC TRAINING DATA PRODUCED BY MEANS OF A GEOMETRICAL DISTORTION MODEL
Abstract
A perturbation model for the generation of synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. The goal of synthetic textline generation is to improve the performance of an offline cursive handwriting recognition system by providing it with additional training data. It can be expected that by adding synthetic training data the variability of the training set improves, which leads to a higher recognition rate. On the other hand, synthetic training data may bias a recognizer towards unnatural handwriting styles, which could lead to a deterioration of the recognition rate. In this paper the proposed perturbation model is evaluated under several experimental conditions, and it is shown that significant improvement of the recognition performance is possible even when the original training set is large and the textlines are provided by a large number of different writers.