Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis


We consider the problem of query-focused multi-document summarization, where a summary containing the information most relevant to a user's information need is produced from a set of topic-related documents. We propose a new method based on probabilistic latent semantic analysis, which allows us to represent sentences and queries as probability distributions over latent topics. Our approach combines query-focused and thematic features computed in the latent topic space to estimate the summary-relevance of sentences. In addition, we evaluate several different similarity measures for computing sentence-level feature scores. Experimental results show that our approach outperforms the best reported results on DUC 2006 data, and also compares well on DUC 2007 data.

  author    = {Hennig, Leonhard},
  title     = {Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis},
  booktitle = {Proceedings of the International Conference RANLP-2009},
  month     = {September},
  year      = {2009},
  address   = {Borovets, Bulgaria},
  publisher = {Association for Computational Linguistics},
  pages     = {144--149},
  url       = {}
Leonhard Hennig
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2009)