skip to main content
10.1145/1055558.1055595acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Adaptive sampling for geometric problems over data streams

Published: 14 June 2004 Publication History

Abstract

Geometric coordinates are an integral part of many data streams. Examples include sensor locations in environmental monitoring, vehicle locations in traffic monitoring or battlefield simulations, scientific measurements of earth or atmospheric phenomena, etc. How can one summarize such data streams using limited storage so that many natural geometric queries can be answered faithfully? Some examples of such queries are: report the smallest convex region in which a chemical leak has been sensed, or track the diameter of the dataset. One can also pose queries over multiple streams: track the minimum distance between the convex hulls of two data streams; or report when datasets A and B are no longer linearly separable.In this paper, we propose an adaptive sampling scheme that gives provably optimal error bounds for extremal problems of this nature. All our results follow from a single technique for computing the approximate convex hull of a point stream in a single pass. Our main result is this: given a stream of two-dimensional points and an integer r, we can maintain an adaptive sample of at most 2r + 1 points such that the distance between the true convex hull and the convex hull of the sample points is O(D/r2), where D is the diameter of the sample set. With our sample convex hull, all the queries mentioned above can be answered in either O(log r) or O(r) time.

References

[1]
P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Approximating extent measures of points. J. ACM. To appear.
[2]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137--147, 1999.
[3]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. ACM PODS, 2002.
[4]
M. d. Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, Berlin, Germany, 2nd edition, 2000.
[5]
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990,
[6]
G. Cormode and S. Muthukrishnan. Radial histogram for spatial streams. Technical Report 2003--11, DIMACS, 2003.
[7]
R. M. Dudley. Metric entropy of some classes of sets with differentiable boundaries. J. Approx. Theory, 10:227--236, 1974.
[8]
http://www.earthscope.org.
[9]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In IEEE Symp. Found. Comput. Sci., pages 501--511, 1999.
[10]
J. Feigenbaum, S. Kannan, and J. Zhang. Computing diameter in the streaming and sliding-window models, 2002. Manuscript.
[11]
J. Gray, A. Szalay, A. Thakar, P. Z. Zunszt, T. Malik, J. Raddick, C. Stoughton, and J. vandenBerg. The SDSS SkyServer: Public access to the Sloan digital sky server data. Technical Report MSR-TR-2001-104, Microsoft Research, 2001.
[12]
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. ACM SIGMOD, 2001.
[13]
P. M. Gruber. Approximation of convex bodies. In P. M. Gruber and J. M. Wills, editors, Convexity and its Applications, pages 131--162. Birkhäuser, Basel, Switzerland, 1983.
[14]
P. Indyk. Stream-based geometric algorithms. In Proc. ACM/DIMACS Workshop on Management and Processing of Data Streams, 2003.
[15]
N. M. Josuttis. The C++ Standard Library: A Tutorial and Reference. Addison-Wesley, Reading, MA, 1999.
[16]
P. S. Kenderov. Polygonal approximation of plane convex compacts. J. Approx. Theory, 38:221--239, 1983.
[17]
G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proc. ACM SIGMOD, 1999.
[18]
D. E. McClure and R. A. Vitale. Polygonal approximation of plane convex bodies. J. Math. Anal. Appl., 51:326--358, 1975.
[19]
S. Muthukrishnan. Data streams: Algorithms and applications. Technical report, Computer Science Department, Rutgers University, 2003.
[20]
F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, New York, NY, 1985.
[21]
W. Pugh. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM, 33(6):668--676, 1990.
[22]
T. J. Richardson. Approximation of planar convex sets from hyperplane probes. Discrete Comput. Geom., 18(2):151--177, 1997.

Cited By

View all
  • (2021)Communication Costs in a Geometric Communication NetworkProceedings of the 22nd International Conference on Distributed Computing and Networking10.1145/3427796.3427800(36-45)Online publication date: 5-Jan-2021
  • (2019)Comparing synopsis techniques for approximate spatial data analysisProceedings of the VLDB Endowment10.14778/3342263.334263512:11(1583-1596)Online publication date: 1-Jul-2019
  • (2017)Clustering high dimensional dynamic data streamsProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305381.3305441(576-585)Online publication date: 6-Aug-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Communication Costs in a Geometric Communication NetworkProceedings of the 22nd International Conference on Distributed Computing and Networking10.1145/3427796.3427800(36-45)Online publication date: 5-Jan-2021
  • (2019)Comparing synopsis techniques for approximate spatial data analysisProceedings of the VLDB Endowment10.14778/3342263.334263512:11(1583-1596)Online publication date: 1-Jul-2019
  • (2017)Clustering high dimensional dynamic data streamsProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305381.3305441(576-585)Online publication date: 6-Aug-2017
  • (2013)From approximate balls to approximate ellipsesJournal of Global Optimization10.1007/s10898-012-9932-156:1(27-42)Online publication date: 1-May-2013
  • (2010)Approximate ellipsoid in the streaming modelProceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part II10.5555/1940424.1940456(401-413)Online publication date: 18-Dec-2010
  • (2010)Approximate Ellipsoid in the Streaming ModelCombinatorial Optimization and Applications10.1007/978-3-642-17461-2_32(401-413)Online publication date: 2010
  • (2009)Summarizing spatial data streams using ClusterHullsACM Journal of Experimental Algorithmics10.1145/1412228.141223813(2.4-2.28)Online publication date: 23-Feb-2009
  • (2009)Identifying High Cardinality Internet HostsIEEE INFOCOM 200910.1109/INFCOM.2009.5061990(810-818)Online publication date: Apr-2009
  • (2009)Answering linear optimization queries with an approximate stream indexKnowledge and Information Systems10.1007/s10115-008-0157-z20:1(95-121)Online publication date: 24-Jun-2009
  • (2009)Approximate voronoi cell computation on spatial data streamsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-007-0081-y18:1(57-75)Online publication date: 1-Jan-2009
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media