ABSTRACT
Answering questions with data is a difficult and time-consuming process. Visual dashboards and templates make it easy to get started, but asking more sophisticated questions often requires learning a tool designed for expert analysts. Natural language interaction allows users to ask questions directly in complex programs without having to learn how to use an interface. However, natural language is often ambiguous. In this work we propose a mixed-initiative approach to managing ambiguity in natural language interfaces for data visualization. We model ambiguity throughout the process of turning a natural language query into a visualization and use algorithmic disambiguation coupled with interactive ambiguity widgets. These widgets allow the user to resolve ambiguities by surfacing system decisions at the point where the ambiguity matters. Corrections are stored as constraints and influence subsequent queries. We have implemented these ideas in a system, DataTone. In a comparative study, we find that DataTone is easy to learn and lets users ask questions without worrying about syntax and proper question form.
Supplemental Material
- Agrawal, S., Chaudhuri, S., and Das, G. Dbxplorer: A system for keyword-based search over relational databases. In Data Engineering '02, IEEE (2002), 5--16. Google ScholarDigital Library
- Androutsopoulos, I., Ritchie, G. D., and Thanisch, P. Natural language interfaces to databases--an introduction. Natural language engineering 1, 01 (1995), 29--81.Google Scholar
- Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., and Sudarshan, S. Keyword searching and browsing in databases using banks. In Data Engineering '02, IEEE (2002), 431--440. Google ScholarDigital Library
- Blunschi, L., Jossen, C., Kossmann, D., Mori, M., and Stockinger, K. Soda: Generating SQL for business users. Proceedings of the VLDB Endowment 5, 10 (2012), 932--943. Google ScholarDigital Library
- Bostock, M., Ogievetsky, V., and Heer, J. D3 data-driven documents. Trans. on Vis. and Comp. Graphics (TVCG) 17, 12 (2011), 2301--2309. Google ScholarDigital Library
- Casner, S. M. Task-analytic approach to the automated design of graphic presentations. ACM Transactions on Graphics (ToG) 10, 2 (1991), 111--151. Google ScholarDigital Library
- Cleveland, W. S., et al. The elements of graphing data. Wadsworth Advanced Books and Software Monterey, CA, 1985. Google ScholarDigital Library
- Cox, K., Grinter, R. E., Hibino, S. L., Jagadeesan, L. J., and Mantilla, D. A multi-modal natural language interface to an information visualization environment. International Journal of Speech Technology 4, 3--4 (2001), 297--314.Google ScholarCross Ref
- Ge, R., and Mooney, R. J. A statistical semantic parser that integrates syntax and semantics. In Computational Natural Language Learning '05, Association for Computational Linguistics (2005), 9--16. Google ScholarDigital Library
- Healey, C. G., Kocherlakota, S., Rao, V., Mehta, R., and St Amant, R. Visual perception and mixed-initiative interaction for assisted visualization design. Trans. on Vis. and Comp. Graphics (TVCG) 14, 2 (2008), 396--411. Google ScholarDigital Library
- Hristidis, V., and Papakonstantinou, Y. Discover: Keyword search in relational databases. In VLDB'02, VLDB Endowment (2002), 670--681. Google ScholarDigital Library
- Kate, R. J., and Mooney, R. J. Using string-kernels for learning semantic parsers. In ICCL-ACL'06, Association for Computational Linguistics (2006), 913--920. Google ScholarDigital Library
- Li, F., and Jagadish, H. V. Nalir: an interactive natural language interface for querying relational databases. In SIGMOD'14, ACM (2014), 709--712. Google ScholarDigital Library
- Li, Y., Yang, H., and Jagadish, H. Nalix: an interactive natural language interface for querying xml. In SIGMOD'05, ACM (2005), 900--902. Google ScholarDigital Library
- Mackinlay, J. Automating the design of graphical presentations of relational information. ACM Trans. Graph. 5, 2 (Apr. 1986), 110--141. Google ScholarDigital Library
- Mackinlay, J., Hanrahan, P., and Stolte, C. Show me: Automatic presentation for visual analysis. Trans. on Vis. and Comp. Graphics (TVCG) 13, 6 (2007), 1137--1144. Google ScholarDigital Library
- Manning, C. D., and Schütze, H. Foundations of statistical natural language processing. MIT press, 1999. Google ScholarDigital Library
- Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., and McClosky, D. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL): System Demonstrations (2014), 55--60.Google ScholarCross Ref
- Miller, G. A. Wordnet: a lexical database for english. Communications of the ACM 38, 11 (1995), 39--41. Google ScholarDigital Library
- Popescu, A.-M., Armanasu, A., Etzioni, O., Ko, D., and Yates, A. Modern natural language interfaces to databases: Composing statistical parsing with semantic tractability. In Computational Linguistics '04, Association for Computational Linguistics (2004), 141. Google ScholarDigital Library
- Popescu, A.-M., Etzioni, O., and Kautz, H. Towards a theory of natural language interfaces to databases. In IUI'03, ACM (2003), 149--157. Google ScholarDigital Library
- Rao, V. R. Mixed-initiative techniques for assisted visualization, 2003.Google Scholar
- Roth, S. F., Kolojejchick, J., Mattis, J., and Goldstein, J. Interactive graphic design using automatic presentation knowledge. In CHI'94, ACM (1994), 112--117. Google ScholarDigital Library
- Roth, S. F., and Mattis, J. Automating the presentation of information. In Artificial Intelligence Applications 1991, vol. 1, IEEE (1991), 90--97.Google ScholarCross Ref
- Satyanarayan, A., and Heer, J. Lyra: An interactive visualization design environment. In Computer Graphics Forum, vol. 33, Wiley Online Library (2014), 351--360. Google ScholarDigital Library
- Schwarz, J., Hudson, S., Mankoff, J., and Wilson, A. D. A framework for robust and flexible handling of inputs with uncertainty. In UIST'10, ACM (2010), 47--56. Google ScholarDigital Library
- Shilman, M., Tan, D. S., and Simard, P. Cuetip: a mixed-initiative interface for correcting handwriting errors. In UIST'06, ACM (2006), 323--332. Google ScholarDigital Library
- Simitsis, A., Koutrika, G., and Ioannidis, Y. Précis: from unstructured keywords as queries to structured databases as answers. VLDB Journal 17, 1 (2008), 117--149. Google ScholarDigital Library
- Stolte, C., Tang, D., and Hanrahan, P. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. Trans. on Vis. and Comp. Graphics (TVCG) 8, 1 (2002), 52--65. Google ScholarDigital Library
- Sun, Y., Leigh, J., Johnson, A., and Lee, S. Articulate: A semi-automated model for translating natural language queries into meaningful visualizations. In Smart Graphics, Springer (2010), 184--195. Google ScholarDigital Library
- Tang, L. R., and Mooney, R. J. Using multiple clause constructors in inductive logic programming for semantic parsing. In ECML '01. Springer, 2001, 466--477. Google ScholarDigital Library
- Tata, S., and Lohman, G. M. Sqak: doing more with keywords. In SIGMOD'08, ACM (2008), 889--902. Google ScholarDigital Library
- Trifacta. Vega. http://trifacta.github.io/vega/.Google Scholar
- Tufte, E. R., and Graves-Morris, P. The visual display of quantitative information, vol. 2. Graphics press Cheshire, CT, 1983. Google ScholarDigital Library
- Wickham, H. ggplot2: Elegant Graphics for Data Analysis. Springer, New York, Aug. 2009. Google ScholarCross Ref
- Wilkinson, L., Wills, D., Rope, D., Norton, A., and Dubbs, R. The grammar of graphics. Springer Science & Business Media, 2006. Google ScholarDigital Library
- Wu, Z., and Palmer, M. Verbs semantics and lexical selection. ACL '94 (1994), 133--138. Google ScholarDigital Library
- Xiao, C., Wang, W., Lin, X., Yu, J. X., and Wang, G. Efficient similarity joins for near-duplicate detection. ACM Trans. on DB Systems (TODS) 36, 3 (2011), 15. Google ScholarDigital Library
- Zelle, J. M., and Mooney, R. J. Learning to parse database queries using inductive logic programming. In National Conference on Artificial Intelligence '96 (1996), 1050--1055. Google ScholarDigital Library
Index Terms
- DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization
Recommendations
Voyager 2: Augmenting Visual Analysis with Partial View Specifications
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing SystemsVisual data analysis involves both open-ended and focused exploration. Manual chart specification tools support question answering, but are often tedious for early-stage exploration where systematic data coverage is needed. Visualization recommenders ...
Visual Classification: Expert Knowledge Guides Machine Learning
Humans use intuition and experience to classify everything they perceive, but only if the distinguishing patterns are visible. Machine-learning algorithms can learn class information from data sets, but the created classes' meaning isn't always clear. A ...
DIVE: A Mixed-Initiative System Supporting Integrated Data Exploration Workflows
HILDA '18: Proceedings of the Workshop on Human-In-the-Loop Data AnalyticsGenerating knowledge from data is an increasingly important activity. This process of data exploration consists of multiple tasks: data ingestion, visualization, statistical analysis, and storytelling. Though these tasks are complementary, analysts ...
Comments