A transfer learning approach to improve object detection (on document-images) performance in presence of poor quality datasets
The detection and classification of multiple objects (which may be several small documents within a single bigger document-image), particularly in presence of a poor dataset (poor w.r.t. low number of training samples and class imbalance), is a challenging task due to the potential overfitting during the training process. Additionally, the distortions which contaminate document images, such as noise and contrast variations, can further challenge the detection’s quality, especially when there are not enough data samples available for some classes, this resulting is strong class imbalance. The dataset used in this research consists of scanned (document-)images with, for each of them, single or multiple documents (e.g. passport, driver’s license, etc.) present within a single document-image page. A multi-step transfer learning technique is introduced and used in this paper to address the multiple document-detection problem under the hard so defined conditions. The main concept of this technique is based on constructing a “bridging domain” between the source and the target domains. A combination of the Faster-RCNN and the ResNet50 models is used to implement four different transfer learning methods. With our developed methods, we have achieved an overall performance of respectively 93% “object classification” accuracy and 98% “object detection” accuracy, this, furthers, along with a significant tolerance towards unseen examples.