Learning Summary Content Units with Topic Modeling


In the field of multi-document summarization, the Pyramid method has become an important approach for evaluating machine-generated summaries. The method is based on the manual annotation of text spans with the same meaning in a set of human model summaries. In this paper, we present an unsupervised, probabilistic topic modeling approach for automatically identifying such semantically similar text spans. Our approach reveals some of the structure of model summaries and identifies topics that are good approximations of the Summary Content Units (SCU) used in the Pyramid method. Our results show that the topic model identifies topic-sentence associations that correspond to the contributors of SCUs, suggesting that the topic modeling approach can generate a viable set of candidate SCUs for facilitating the creation of Pyramids.

  author    = {Hennig, Leonhard  and  De Luca, Ernesto William  and  Albayrak, Sahin},
  title     = {Learning Summary Content Units with Topic Modeling},
  booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010)},
  month     = {August},
  year      = {2010},
  address   = {Beijing, China},
  publisher = {Coling 2010 Organizing Committee},
  pages     = {391--399},
  url       = {http://www.aclweb.org/anthology/C10-2045}
Leonhard Hennig, Ernesto William De Luca, Sahin Albayrak
Conference Paper
23rd International Conference on Computational Linguistics (COLING 2010)