ANALYSIS OF PRINTED FORMS
Automatic analysis of images of printed forms is a problem of both practical and theoretical interest, due to its importance in office automation, and due to the conceptual challenges posed for document image analysis. The automatic reading of optically scanned forms consists of two major components. The first is the extraction of the data image from the form; the second is the interpretation of the image as coded alphanumerics, and is commonly referred to as optical character recognition (OCR). The individual steps involved in forms analysis include image pre-processing, forms identification, field extraction, data interpretation, and contextual post-processing. In this chapter we outline the issues and current state-of-the-art in the analysis of printed forms. Forms analysis poses several challenges, given the enormous variety of current form layouts and contents, and some of these research issues are explored. In particular, two current forms analysis systems developed at CEDAR are described.