|
ABSTRACT
Existing content-based publish/subscribe systems are designed assuming that all matching publications are equally relevant to a subscription. As we cannot know in advance the distribution of publication content, the following two unwanted situations are highly possible: a subscriber either receives too many or only few publications. In this paper we present a new publish/subscribe model which is based on the sliding window computation model. Our model assumes that publications have different relevance to a subscription. In the model, a subscriber receives k most relevant publications published within a time window w, where k and w are parameters defined per each subscription. As a consequence, the arrival rate of incoming relevant publications per subscription is predefined, and does not depend on the publication rate. Since all relevant objects (i.e. publications in our case) cannot be kept in main memory, existing solutions immediately discard less relevant objects, and store only a small representative set for subsequent delivery. In this paper we develop a probabilistic criterion to decide upon the arrival of a new object whether it may become the top-k object at some future point in time and should thus be stored in a special publications queue. We show that by accepting typically very small probability of error, the queue length is reasonably small and does not significantly depend on publication rate. Furthermore, we experimentally evaluate our approach to demonstrate its applicability in practice.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1--16, New York, NY, USA, 2002. ACM.
|
| |
2
|
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999.
|
| |
3
|
R. Baldoni and A. Virgillito. Distributed Event Routing in Publish/Subscribe Communication Systems: a Survey (revised version). Technical report, Dipartimento di Informatica e Sistemistica "A. Ruberti", Universitá di Roma la Sapienza, 2006.
|
| |
4
|
C. Böhm, B. C. Ooi, C. Plant, and Y. Yan. Efficiently processing continuous k-nn queries on data streams. In ICDE, pages 156--165. IEEE, 2007.
|
| |
5
|
N. Bruno, S. Chaudhuri, and L. Gravano. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst., 27(2):153--187, 2002.
|
| |
6
|
F. Bruss. Sum the odds to one and stop. Annals of Probability, 28(3):1384--1391, 2000.
|
| |
7
|
F. Bruss and D. Paindaveine. Selecting a sequence of last successes in independent trials. Journal of Applied Probability, 37:389--399, 2000.
|
| |
8
|
M. Caporuscio and P. Inverardi. Uncertain event-based model for egocentric context sensing. In SEM, pages 25--32, New York, NY, USA, 2005. ACM.
|
| |
9
|
A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Achieving scalability and expressiveness in an internet-scale event notification service. In PODC, pages 219--227, New York, NY, USA, 2000. ACM Press.
|
| |
10
|
R. Chand and P. Felber. Xnet: A reliable content-based publish/subscribe system. In SRDS, pages 264--273, Washington, DC, USA, 2004. IEEE Computer Society.
|
| |
11
|
R. Chand and P. Felber. Semantic peer-to-peer overlays for publish/subscribe networks. In Euro-Par, pages 1194--1204, 2005.
|
| |
12
|
T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2001.
|
| |
13
|
G. Das, D. Gunopulos, N. Koudas, and N. Sarkas. Ad-hoc top-k query answering for data streams. In VLDB, pages 183--194. VLDB Endowment, 2007.
|
| |
14
|
M. Datar and R. Motwani. The Sliding-Window Computation Model and Results, chapter 8, pages 149--167. Advances in Database Systems. Springer-Verlag New York, 2006.
|
| |
15
|
F. Fabret, H. A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering algorithms and implementation for very fast publish/subscribe systems. SIGMOD Rec., 30(2):115--126, 2001.
|
| |
16
|
R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999.
|
| |
17
|
L. Gao and X. S. Wang. Continually evaluating similarity-based pattern queries on a streaming time series. In SIGMOD, pages 370--381, New York, NY, USA, 2002. ACM.
|
| |
18
|
S. Kale, E. Hazan, F. Cao, and J. P. Singh. Analysis and algorithms for content-based event matching. In ICDCSW, pages 363--369, Washington, DC, USA, 2005. IEEE Computer Society.
|
| |
19
|
N. Koudas, B. C. Ooi, K.-L. Tan, and R. Zhang. Approximate nn queries on streams with guaranteed error/performance bounds. In VLDB, pages 804--815. VLDB Endowment, 2004.
|
| |
20
|
A. Lekova, K. Skjelsvik, T. Plagemann, and V. Goebel. Fuzzy logic-based event notification in sparse manets. In AINAW, pages 296--301, Washington, DC, USA, 2007. IEEE Computer Society.
|
| |
21
|
X. Lin, Y. Yuan, W. Wang, and H. Lu. Stabbing the sky: Efficient skyline computation over sliding windows. In ICDE, pages 502--513, Washington, DC, USA, 2005. IEEE Computer Society.
|
| |
22
|
H. Liu and H.-A. Jacobsen. A-topss - a publish/subscribe system supporting approximate matching. In VLDB, pages 1107--1110, 2002.
|
| |
23
|
H. Liu and H.-A. Jacobsen. A-topss -- a publish/subscribe system supporting imperfect information processing. In VLDB, pages 281--1284, August 2004.
|
| |
24
|
H. Liu and H.-A. Jacobsen. Modeling uncertainties in publish/subscribe systems. In ICDE, page 510, Washington, DC, USA, 2004. IEEE Computer Society.
|
| |
25
|
M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and information retrieval. J. ACM, 7(3):216--244, 1960.
|
| |
26
|
S. Michel, P. Triantafillou, and G. Weikum. Klee: a framework for distributed top-k query algorithms. In VLDB, pages 637--648. VLDB Endowment, 2005.
|
| |
27
|
K. Mouratidis, S. Bakiras, and D. Papadias. Continuous monitoring of top-k queries over sliding windows. In SIGMOD, pages 635--646, New York, NY, USA, 2006. ACM.
|
| |
28
|
G. Mühl, L. Fiege, and A. P. Buchmann. Filter similarities in content-based publish/subscribe systems. In ARCS, pages 224--240, London, UK, 2002. Springer-Verlag.
|
| |
29
|
G. Mühl, L. Fiege, and P. Pietzuch. Distributed Event-Based Systems. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
|
| |
30
|
G. Mühl, A. Ulbrich, K. Herrmann, and T. Weis. Disseminating information to mobile clients using publish-subscribe. IEEE Internet Computing, 8(3):46--53, 2004.
|
| |
31
|
G. P. Picco, D. Balzarotti, and P. Costa. Lights: a lightweight, customizable tuple space supporting context-aware applications. In SAC, pages 413--419, New York, NY, USA, 2005. ACM.
|
| |
32
|
P. R. Pietzuch and J. M. Bacon. Hermes: A distributed event-based middleware architecture. ICDCSW, 00:611, 2002.
|
| |
33
|
C. Raiciu, D. S. Rosenblum, and M. Handley. Revisiting content-based publish/subscribe. In ICDCSW, page 19, Washington, DC, USA, 2006. IEEE Computer Society.
|
| |
34
|
S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Taylor Graham Publishing, London, UK, UK, 1988.
|
| |
35
|
G. Salton and M. E. Lesk. Computer evaluation of indexing and text processing. J. ACM, 15(1):8--36, 1968.
|
| |
36
|
T. Sivaharan, G. S. Blair, and G. Coulson. Green: A configurable and re-configurable publish-subscribe middleware for pervasive computing. In OTM Conferences (1), pages 732--749, 2005.
|
|