Chapter 2: Using a Polytope Model for Unsupervised Document Summarization
The problem of extractive text summarization for a collection of documents is defined as the problem of selecting a small subset of sentences so that the contents and meaning of the original document set are preserved in the best possible way. As such, summarization can be expressed as an optimization problem, where the information coverage of the original document set must be maximized in a summary. This approach is unsupervised and can be solved by linear programming (LP). The open question that remains here is how to mathematically express information coverage as an objective function. In this chapter we present a summarization technique that produces extractive summaries with the best objective value obtained by LP. We describe the polytope model, which provides real weights to term occurrences, representing their importance for a summary.