ABSTRACT
As topic modeling has grown in popularity, tools for visualizing the process have become increasingly common. Though these tools support a variety of different tasks, they generally have a view or module that conveys the contents of an individual topic. These views support the important task of gist-forming: helping the user build a cohesive overall sense of the topic's semantic content that can be generalized outside the specific subset of words that are shown. There are a number of factors that affect these views, including the visual encoding used, the number of topic words included, and the quality of the topics themselves. To our knowledge, there has been no formal evaluation comparing the ways in which these factors might change users' interpretations. In a series of crowdsourced experiments, we sought to compare features of visual topic representations in their suitability for gist-forming. We found that gist-forming ability is remarkably resistant to changes in visual representation, though it deteriorates with topics of lower quality.
- E. Alexander, J. Kohlmann, R. Valenza, M. Witmore, and M. Gleicher. Serendip: Topic model-driven visual exploration of text corpora. In Visual Analytics Science and Technology (VAST), 2014 IEEE Conference on, pages 173--182. IEEE, 2014.Google ScholarCross Ref
- L. AlSumait, D. Barbará, J. Gentle, and C. Domeniconi. Topic significance ranking of lda generative models. In Machine Learning and Knowledge Discovery in Databases, pages 67--82. Springer, 2009.Google ScholarCross Ref
- D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. J. Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- M. Bostock, V. Ogievetsky, and J. Heer. D3: Data-driven documents. IEEE TVCG, 2011. Google ScholarDigital Library
- A. Chaney and D. Blei. Visualizing topic models. In Proc. AAAI on Weblogs and Social Media, 2012.Google Scholar
- J. Chang, S. Gerrish, C. Wang, J. L. Boyd-graber, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In Advances in neural information processing systems, pages 288--296, 2009.Google ScholarDigital Library
- J. Chuang, C. Manning, and J. Heer. Termite: visualization techniques for assessing textual topic models. In Proc. Advanced Visual Interfaces, pages 74--77. ACM, 2012. Google ScholarDigital Library
- C. Collins, F. B. Viégas, and M. Wattenberg. Parallel tag clouds to explore and analyze facted text corpora. In Proc. of the IEEE Symp. on Visual Analytics Science and Technology (VAST), 2009.Google Scholar
- M. Correll and M. Gleicher. Error bars considered harmful: Exploring alternate encodings for mean and error. IEEE Transactions on Visualization and Computer Graphics, 20(12):2142--2151, dec 2014. IEEE Vis Conference, InfoVis track, to appear.Google ScholarCross Ref
- W. Cui, S. Liu, L. Tan, C. Shi, Y. Song, Z. Gao, H. Qu, and X. Tong. Textow: Towards better understanding of evolving topics in text. IEEE TVCG, 17(12):2412--2421, 2011. Google ScholarDigital Library
- J. Davies. d3-cloud. https://github.com/jasondavies/d3-cloud, 2015.Google Scholar
- M. J. Halvey and M. T. Keane. An assessment of tag presentation techniques. In Proceedings of the 16th international conference on World Wide Web, pages 1313--1314. ACM, 2007. Google ScholarDigital Library
- J. Harris. Word clouds considered harmful, blog, http://www.niemanlab.org/2011/10/word-clouds-considered-harmful/, 2011.Google Scholar
- M. A. Hearst and D. Rosner. Tag clouds: Data analysis tool or social signaller? In Hawaii International Conference on System Sciences, Proceedings of the 41st Annual, pages 160--160. IEEE, 2008. Google ScholarDigital Library
- S. Lohmann, J. Ziegler, and L. Tetzlaff. Comparison of tag cloud layouts: Task-related performance and visual exploration. In Human-Computer Interaction--INTERACT 2009, pages 392--404. Springer, 2009. Google ScholarDigital Library
- E. Meeks. Using word clouds for topic modeling results, blog, https://dhs.stanford.edu/algorithmic-literacy/using-word-clouds-for-topic-modeling-results/, 2012.Google Scholar
- R. Řehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.Google Scholar
- A. W. Rivadeneira, D. M. Gruen, M. J. Muller, and D. R. Millen. Getting our head in the clouds: toward evaluation studies of tagclouds. In Proceedings of the SIGCHI conference on Human factors in computing systems, pages 995--998. ACM, 2007. Google ScholarDigital Library
- E. Sandhaus. The New York Times Annotated Corpus LDC2008T19. DVD. Philadelphia: Linguistic Data Consortium, 2008.Google Scholar
- A. J. Torget, R. Mihalcea, J. Christensen, and G. McGhee. Mapping texts: Combining text-mining and geo-visualization to unlock the research potential of historical newspapers. 2011.Google Scholar
- T. van der Geest and R. van Dongelen. What is beautiful is useful-visual appeal and expected information quality. In Professional Communication Conference, 2009. IPCC 2009. IEEE International, pages 1--5. IEEE, 2009.Google ScholarCross Ref
- F. B. Viégas and M. Wattenberg. Timelines tag clouds and the case for vernacular visualization. interactions, 15(4):49--52, 2008. Google ScholarDigital Library
- F. Wei, S. Liu, Y. Song, S. Pan, M. X. Zhou, W. Qian, L. Shi, L. Tan, and Q. Zhang. Tiara: a visual exploratory text analytic system. In Proc. ACM Knowledge discovery and data mining, pages 153--162. ACM, 2010. Google ScholarDigital Library
Recommendations
Incremental topic representations
COLING '04: Proceedings of the 20th international conference on Computational LinguisticsWe consider the problem of modeling information about the topic discussed in a text. We describe in this paper two incremental enhancements of the topic signatures introduced by (Lin and Hovy, 2000). The first enhancement considers topic representations ...
Topic analysis for topic-focused multi-document summarization
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementTopic-focused multi-document summarization has been a challenging task because the created summary is required to be biased to the given topic or query. Existing methods consider the given topic as a single coarse unit and then directly incorporate the ...
Comments