Abstract
One-pass algorithms for sampling n records without replacement from a population of unknown size n are known as reservoir-sampling algorithms. In this article, Vitter's reservoir-sampling algorithm, algorithm Z, is modified to give a more efficient algorithm, algorithm K. Additionally, two new algorithms, algorithm L and algorithm M, are proposed. If the time for scanning the population is ignored, all the four algorithms have expected CPU time O(n(1 + log(N/n))), which is optimum up to a constant factor. Expressions of the expected CPU time for the algorithms are presented. Among the four, algorithm L is the simplest, and algorithm M is the most efficient when n and N/n are large and N is O(n2).
- BRATLEY, P., FOX, B. L., AND SCHRA~E, L.E. 1987. A Guide to Simulation. 2nd ed. Springer- Verlag, New York. Google Scholar
- CHENG, R. C. $. 1978. Generating Beta variates with nonintegral shape parameters. Commun. ACM 21, 4 (Apr.), 317-322. Google Scholar
- DEVROYE, L. 1986. Non-uniform Random Variate Generation. Springer-Verlag, New York.Google Scholar
- FAN, C. T., MULLER, M. E., ANn REZUCHA, I. 1962. Development of sampling plans by using sequential (item by item) selection techniques and digital computers. Am. Stat. Assoc. J. 57, 298 (June), 387-402.Google Scholar
- IMSL. 1987. IMSL Stat/Library, User's Manual. IMSL, Inc., Houston, Tex.Google Scholar
- KNUTH, n. E. 1981. The Art of Computer Programming. Vol. 2. Seminumerical Algorithms. 2nd ed. Addison-Wesley, Reading, Mass. Google Scholar
- KNUT~, D. E. 1969. The Art of Computer Programming. Vol. 2. Seminumerical Algorithms. Addison-Wesley, Reading, Mass. Google Scholar
- LI, K.-H. 1992a. Reservoir sampling algorithms. I. Analysis and modification of Vitter's algorithm Z. Tech. Rep. No. 92.2, Dept. of Statistics, The Chinese Univ. of Hong Kong, Shatin, N.T., Hong Kong.Google Scholar
- LL K.-H. 1992b. Reservoir sampling algorithms. II. New algorithms. Tech. Rep. No. 92.3, Dept. of Statistics, The Chinese Univ. of Hong Kong, Shatin, N.T., Hong Kong.Google Scholar
- McLEOD, A. I. AND BELLHOUSE, D. R. 1983. A convenient algorithm for drawing a simple random sample. Appl. Stat. 32, 2, 182-184.Google Scholar
- PIN~C4AM, R.S. 1987. An efficient algorithm for drawing a simple random sample. Appl. Stat. 36, 3, 370-372.Google Scholar
- VITTER, J.S. 1985. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 1 (Mar.), 37-57. Google Scholar
Index Terms
- Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))
Recommendations
Randomized minimum spanning tree algorithms using exponentially fewer random bits
For many fundamental problems there exist randomized algorithms that are asymptotically optimal and are superior to the best-known deterministic algorithm. Among these are the minimum spanning tree (MST) problem, the MST sensitivity analysis problem, ...
Optimal sampling from sliding windows
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsA sliding windows model is an important case of the streaming model, where only the most "recent" elements remain active and the rest are discarded in a stream. The sliding windows model is important for many applications (see, e.g., Babcock, Babu, ...
Improved Algorithms for Uniform Partitions of Points
We consider the following one- and two-dimensional bucketing problems: Given a set S of n points in \reals1 or \reals2 and a positive integer b , distribute the points of S into b equal-size buckets so that the maximum number of points in a bucket ...
Comments