ABSTRACT
We present a new interactive data exploration approach, called Semantic Windows (SW), in which users query for multidimensional "windows" of interest via standard DBMS-style queries enhanced with exploration constructs. Users can specify SWs using (i) shape-based properties, e.g., "identify all 3-by-3 windows", as well as (ii) content-based properties, e.g., "identify all windows in which the average brightness of stars exceeds 0.8". This SW approach enables the interactive processing of a host of useful exploratory queries that are difficult to express and optimize using standard DBMS techniques. SW uses a sampling-guided, data-driven search strategy to explore the underlying data set and quickly identify windows of interest. To facilitate human-in-the-loop style interactive processing, SW is optimized to produce online results during query execution. To control the tension between online performance and query completion time, it uses a tunable, adaptive prefetching technique. To enable exploration of big data, the framework supports distributed computation.
We describe the semantics and implementation of SW as a distributed layer on top of PostgreSQL. The experimental results with real astronomical and artificial data reveal that SW can offer online results quickly and continuously with little or no degradation in query completion times.
- The sloan digital sky survey (sdss). http://www.sdss.org/.Google Scholar
- S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In SIGMOD, pages 487--498, 2000. Google ScholarDigital Library
- S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: Queries with bounded errors and bounded response times on very large data. In EuroSys '13, pages 29--42, 2013. Google ScholarDigital Library
- S. Chaudhuri, G. Das, and V. Narasayya. Optimized stratified sampling for approximate query processing. ACM TODS, 32(2), June 2007. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, pages 102--113, 2001. Google ScholarDigital Library
- J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In ICDE, pages 152--159, 1996. Google ScholarDigital Library
- P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. SIGMOD Rec., 28(2):287--298, June 1999. Google ScholarDigital Library
- J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. SIGMOD Rec., 26(2):171--182, June 1997. Google ScholarDigital Library
- R. T. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD, pages 13--24, 1998. Google ScholarDigital Library
- V. Raghavan and E. A. Rundensteiner. Progressive result generation for multi-criteria decision support queries. In ICDE, pages 733--744, 2010.Google ScholarCross Ref
- L. Sidirourgos, M. L. Kersten, and P. A. Boncz. Sciborq: Scientific data management with bounds on runtime and quality. In CIDR, pages 296--301, 2011.Google Scholar
Index Terms
Interactive data exploration using semantic windows
Recommendations
Cognitive Stages in Visual Data Exploration
BELIV '16: Proceedings of the Sixth Workshop on Beyond Time and Errors on Novel Evaluation Methods for VisualizationData exploration requires forming analysis goals, planning actions and evaluating results effectively, all of which are complex cognitive activities. Therefore, the data exploration and analysis process can be improved through a principled and ...
Interactive Search and Exploration of Waveform Data with Searchlight
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataSearchlight enables search and exploration of large, multi-dimensional data sets interactively. It allows users to explore by specifying rich constraints for the "objects" they are interested in identifying. Constraints can express a variety of ...
Interactive Exploration of Correlated Time Series
ExploreDB'17: Proceedings of the ExploreDB'17The rapid growth of monitoring applications has led to unprecedented amounts of generated time series data. Data analysts typically explore such large volumes of time series data looking for valuable insights. One such insight is finding pairs of time ...
Comments