Summarization aims to make the information content of a document efficiently accessible. In extractive summarization, sentences of a document are ranked by their importance and top-scoring sentences are selected according to a given summary length constraint. A similar approach is used in multidocument summarization; however, in this case, it is more probable to see the same information in several different surface forms. In focused summarization, the process is biased by the information need expressed in the form of a query.
In this paper, the authors first introduce a concept called deep dependency substructure (DDSS) that aims to accurately reflect semantic relationships among words and then employ DDSSs for focused multidocument summarization. For DDSS ranking, the authors apply an approach based on integer linear programming by using not only frequent DDSSs, but also word bigrams; they pay special attention to the possible redundancy problem.
In the experiments, several baselines and Document Understanding Conference (DUC) test collections are used. The experiments are comprehensive and the results are comparable to other state-of-the-art methods. The study provides a good new baseline and will be especially useful to researchers working on similar topics.