Abstract
Given a collection of document groups, a natural question is to identify the differences among them. Although traditional document summarization techniques can summarize the content of the document groups one by one, there exists a great necessity to generate a summary of the differences among the document groups. In this article, we study a novel problem, that of summarizing the differences between document groups. A discriminative sentence selection method is proposed to extract the most discriminative sentences which represent the specific characteristics of each document group. Experiments and case studies on real-world data sets demonstrate the effectiveness of our proposed method.
- Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998. Topic detection and tracking pilot study final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. 194--218.Google Scholar
- Allan, J., Gupta, R., and Khandelwal, V. 2001. Temporal summaries of new topics. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). ACM, New York, 10--18. Google ScholarDigital Library
- Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM, New York. Google ScholarDigital Library
- Barzilay, R., McKeown, K., and Elhadad, M. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the ACL. Google ScholarDigital Library
- Baxendale, P. B. 1958. Machine-made index for technical literature: An experiment. IBM J. Res. Dev. 2, 354--361. Google ScholarDigital Library
- Brants, T., Chen, F., and Farahat, A. 2003. A system for new event detection. In Proceedings of the SIGIR'03 Conference. ACM, New York, 330--337. Google ScholarDigital Library
- Carbonell, J. and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98). ACM, New York, 335--336. Google ScholarDigital Library
- Chi, Y., Zhu, S., Song, X., Tatemura, J., and Tseng, B. L. 2007. Structural and temporal analysis of the blogosphere through community factorization. In Proceedings of the SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Conroy, J. M. and O'Leary, D. P. 2001. Text summarization via hidden Markov models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '01). ACM, New York, 406--407. Google ScholarDigital Library
- Ding, C., He, X., and Simon, H. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proceedings of the SIAM Data Mining Conference.Google Scholar
- DUC. 2006. http://www-nlpir.nist.gov/projects/duc/pubs/.Google Scholar
- Edmundson, H. P. 1969. New methods in automatic extracting. J. ACM 16, 264--285. Google ScholarDigital Library
- Erkan and Radev, D. R. 2004. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the EMNLP.Google Scholar
- Fung, G. P. C., Yu, J. X., Liu, H., and Yu, P. S. 2007. Time-dependent event hierarchy construction. In Proceedings of the KDD'07 Conference. ACM, New York, 300--309. Google ScholarDigital Library
- Goldstein, J., Kantrowitz, M., Mittal, V., and Carbonell, J. 1999. Summarizing text documents: Sentence selection and evaluation metrics. In Research and Development in Information Retrieval, 121--128. Google ScholarDigital Library
- Gong, Y. and Liu, X. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the SIGIR Conference. Google ScholarDigital Library
- Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the SIGKDD Conference. Google ScholarDigital Library
- Jing, H. and McKeown, K. 2000. Cut and paste based text summarization. In Proceedings of the NAACL Conference. Google ScholarDigital Library
- Knight, K. and Marcu, D. 2002. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. In Artificial Intelligence, 91--107. Google ScholarDigital Library
- Kumaran, G. and Allan, J. 2004. Text classification and named entities for new event detection. In Proceedings of SIGIR'04 Conference. ACM, New York, 297--304. Google ScholarDigital Library
- Lerman, K. and McDonald, R. 2009. Contrastive summarization: An experiment with consumer reviews. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics. Companion Volume: Short Papers, 113--116. Google ScholarDigital Library
- Li, T. and Ding, C. 2006. The relationships among various nonnegative matrix factorization methods for clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM). IEEE, Los Alamitos, CA, 362--371. Google ScholarDigital Library
- Li, T. and Ding, C. 2008. Weighted consensus clustering. In In Proceedings of 2008 SIAM International Conference on Data Mining (SDM).Google Scholar
- Li, X. and Croft, W. B. 2006. Improving novelty detection for general topics using sentence-level information patterns. In Proceedings of the CIKM'06. ACM, New York, 238--247. Google ScholarDigital Library
- Lin, C.-Y. and E. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of NLT-NAACL Conference. Google ScholarDigital Library
- Makkonen, J., Ahonen-Myka, H., and Salmenkivi, M. 2004. Simple semantics in topic detection and tracking. Inf. Retrieval 7, 347--368. Google ScholarDigital Library
- Mani, I. 2001. Automatic Summarization. John Benjamins Co.Google Scholar
- Mani, I. and Bloedorn, E. 1997. Multi-document summarization by graph search and matching. In AAAI/IAAI, 622--628. Google ScholarDigital Library
- Mani, I. and Bloedorn, E. 1999. Summarizing similarities and differences among related documents. Inf. Retrieval 1, 35--67. Google ScholarDigital Library
- McCallum, A., Nigam, K., Rennie, J., and Seymore, K. 2000. Automating the construction of Internet portals with machine learning. Inf. Retrieval J. 127--163. Google ScholarDigital Library
- Mihalcea, R. and Tarau, P. 2005. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP.Google Scholar
- Morinaga, S. and Yamanishi, K. 2004. Tracking dynamics of topic trends using a finite mixture model. In Proceedings of KDD'04. ACM, New York, 811--816. Google ScholarDigital Library
- Nenkova, A., Passonneau, R. J., and McKeown, K. 2007. The pyramid method: Incorporating human content selection variation in summarization evaluation. Trans. Speech Lang. Process. 4, 2. Google ScholarDigital Library
- Ning, H., Xu, W., Chi, Y., Gong, Y., and Huang, T. S. 2007. Incremental spectral clustering with application to monitoring of evolving blog communities. In Proceedings of SIAM Data Mining Conference.Google Scholar
- Ou, S., Khoo, C., and Goh, D. 2007. Multi-document summarization focusing on extracting and integrating similarities and differences among documents. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-2007). 442--446.Google Scholar
- Paul, M.J., Zhai, C., and Girju, R. 2010. Summarizing contrastive viewpoints in opinionated text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. (EMNLP'10). ACL, 66--76. Google ScholarDigital Library
- Petersen, K. B. and Pedersen, M. S. 2006. The matrix cookbook. Version 20051003.Google Scholar
- Radev, D., Jing, H., Stys, M., and Tam, D. 2004. Centroid-based summarization of multiple documents. Inf. Process. Manage. 919--938. Google ScholarDigital Library
- Shen, C. and Li, T. 2010. Multi-document summarization via the minimum dominating set. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING'10). 984--992. Google ScholarDigital Library
- Shen, D., Sun, J.-T., Li, H., Yang, Q., and Chen, Z. 2007. Document summarization using conditional random fields. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07). 2862--2867. Google ScholarDigital Library
- Wan, X. and Yang, J. 2008. Multi-document summarization using cluster-based link analysis. In Proceedings of the 31 Annual International SIGIR Conference. Google ScholarDigital Library
- Wang, D. and Li, T. 2010. Many are better than one: Improving multi-document summarization via weighted consensus. In Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 809--810. Google ScholarDigital Library
- Wang, D., Li, T., Zhu, S., and Ding, C. 2008a. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). ACM, New York, 307--314. Google ScholarDigital Library
- Wang, D., Li, T., Zhu, S., and Ding, C. 2008b. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the SIGIR Conference. Google ScholarDigital Library
- Wang, D., Zhu, S., Li, T., and Gong, Y. 2009a. Comparative document summarization via discriminative sentence selection. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). ACM, New York, 1963--1966. Google ScholarDigital Library
- Wang, D., Zhu, S., Li, T., and Gong, Y. 2009b. Multi-document summarization using sentencebased topic models. In Proceedings of the ACL-IJCNLP Conference. (Short Paper). 297--300. Google ScholarDigital Library
- Yang, Y., Pierce, T., and Carbonell, J. 1998. A study of retrospective and on-line event detection. In Proceedings of SIGIR'98 Conference. ACM, New York, 28--36. Google ScholarDigital Library
- Yu, K., Bi, J., and Tresp, V. 2006. Active learning via transductive experimental design. In Proceedings of the ICML Conference. Google ScholarDigital Library
- Zhai, C., Velivelli, A., and Yu, B. 2004. A cross-collection mixture model for comparative text mining. In Proceedings of the SIGKDD Conference. Google ScholarDigital Library
- Zhang, K., Zi, J., and Wu, L.G. 2007. New event detection based on indexing-tree and named entity. In Proceedings of the SIGIR '07 Conference. ACM, New York, 215--222. Google ScholarDigital Library
- Zhang, Y., Callan, J., and Minka, T. 2002. Novelty and redundancy detection in adaptive filtering. In Proceedings of the SIGIR'02 Conference. ACM, New York, 81--88. Google ScholarDigital Library
- Zhao, Q., Mitra, P., and Chen, B. 2007. Temporal and information flow-based event detection from social text streams. In Proceedings of the 22nd National Conference on Artificial Intelligence. Vol. 2, AAAI Press, 1501--1506. Google ScholarDigital Library
- Zhu, S., Wang, D., Yu, K., Li, T., and Gong, Y. 2010. Feature selection for gene expression using model-based entropy. IEEE/ACM Trans. Comput. Biol. Bioinform. 7, 1, 25--36. Google ScholarDigital Library
Index Terms
- Comparative document summarization via discriminative sentence selection
Recommendations
Comparative Document Summarization via Discriminative Sentence Selection
Given a collection of document groups, a natural question is to identify the differences among these groups. Although traditional document summarization techniques can summarize the content of the document groups one by one, there exists a great ...
Comparative document summarization via discriminative sentence selection
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementGiven a collection of document groups, a quick question is what are the differences in these groups. In this paper, we study a novel problem of summarizing the differences between document groups. A discriminative sentence selection method is proposed ...
Multi-document abstractive summarization using ILP based multi-sentence compression
IJCAI'15: Proceedings of the 24th International Conference on Artificial IntelligenceAbstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach ...
Comments