A basic model for document processing is presented in this chapter. In this model, document processing can be divided into two phases: document analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers to document analysis; mapping the geometric structure into logical structure deals with document understanding. Both types of document structures and the two areas of document processing are discussed in this chapter.
Top-down and bottom-up approaches have been used in document analysis. Tree transform, formatting knowledge and description language approaches have been used in document understanding. All the above approaches are presented.
A particular case—form document processing—is discussed. Form description and form registration approaches are presented. A form processing system is also introduced.
Finally, many techniques, such as Hough transform, projection, crossing counts, form definition language, etc. which have been used in these approaches are also discussed here.