A PROBABILISTIC APPROACH TO MULTI-DOCUMENT SUMMARIZATION FOR GENERATING A TILED SUMMARY
Abstract
Data availability is not a major issue at present times in view of the widespread use of Internet; however, information and knowledge availability are the issues. Due to data overload and time-critical nature of information need, automatic summarization of documents plays a significant role in information retrieval and text data mining. This paper discusses the design of a multi-document summarizer that uses Katz's K-mixture model for term distribution. The model helps in ranking the sentences by a modified term weight assignment. Highly ranked sentences are selected for the final summary. The sentences that are repetitive in nature are eliminated, and a tiled summary is produced. Our method avoids redundancy and produces a readable (even browsable) summary, which we refer to as an event-specific tiled summary. The system has been evaluated against the frequently occurring sentences in the summaries generated by a set of human subjects. Our system outperforms other auto-summarizers at different extraction levels of summarization with respect to the ideal summary, and is close to the ideal summary at 40% extraction level.
Remember to check out the Most Cited Articles! |
---|
Check out these titles in artificial intelligence! |