ACM Home Page
Please provide us with feedback. Feedback
SWAM: a family of access methods for similarity-search in peer-to-peer data networks
Full text PdfPdf (324 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management table of contents
Washington, D.C., USA
SESSION: DB-4 (databases): similarity search table of contents
Pages: 304 - 313  
Year of Publication: 2004
ISBN:1-58113-874-1
Authors
Farnoush Banaei-Kashani  University of Southern California, Los Angeles, CA
Cyrus Shahabi  University of Southern California, Los Angeles, CA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 50,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031171.1031236
What is a DOI?

ABSTRACT

Peer-to-peer Data Networks (PDNs) are large-scale, self-organizing, distributed query processing systems. Familiar examples of PDN are peer-to-peer file-sharing networks, which support exact-match search queries to locate user-requested files. In this paper, we formalize the more general problem of <i>similarity-search</i> in PDNs, and propose a <i>family</i> of distributed access methods, termed <i>Small-World Access Methods (SWAM)</i>, for efficient execution of various similarity-search queries, namely exact-match, range, and k-nearest-neighbor queries. Unlike its predecessors, i.e., LH* and DHTs, SWAM does not control the assignment of data objects to PDN nodes; each node autonomously stores its own data. Besides, SWAM supports all similarity-search queries on multiple attributes. SWAM guarantees that the query object will be found (if it exists in the network) in average time logarithmically proportional to the network size. Moreover, once the query object is found, all the similar objects would be in its proximate network neighborhood and hence enabling efficient range and k-nearest-neighbor queries.

As a specific instance of SWAM, we propose <i>SWAM-V</i>, a Voronoi-based SWAM that indexes PDNs with multi-attribute data objects. For a PDN with <i>N</i> nodes SWAM-V has query time, communication cost, and computation cost of <i>O</i>(log <i>N</i>) for exact-match queries, and <i>O</i>(log <i>N</i> + <b>s</b><i>N</i>) and <i>O</i>(log <i>N</i> + <b>k</b>) for range queries (with selectivity <b>s</b>) and <b>k</b>NN queries, respectively. Our experiments show that SWAM-V consistently outperforms a similarity-search enabled version of CAN in query time and communication cost by a factor of 2 to 3.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
F. Banaei-Kashani and C. Shahabi. Searchable querical data networks. In Proceedings of the International Workshop on Databases, Information Systems and Peer-to-Peer Computing in conjunction with VLDB'03, September 2003.
4
 
5
B. Bollobás. Random Graphs. Academic Press, New York, 1985.
 
6
7
8
 
9
Freenet. Freenet project, 2004. http://freenet.sourceforge.net/.
10
 
11
A. Gupta, D. Agrawal, and A. El Abbadi. Approximate range selection queries in peer-to-peer systems. In Proceedings of the First Biennial Conference on Innovative Data Systems Research, January 2003.
 
12
R. Huebsch, N. Lanham, B. Loo, J. Hellerstein, S. Shenker, and I. Stoica. Querying the inernet withPIER. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB'03), September 2003.
 
13
KaZaA. Sharman networks, 2004. http://www.kazaa.com/.
14
15
 
16
 
17
18
 
19
R. Seidel. Exact upper bounds for the number of faces in d-dimensional Voronoi diagrams, DIMACS Series, volume~4. American Mathematical Society, 1991.
 
20
SETI@home, 2004. http://setiathome.ssl.berkeley.edu//.
21
 
22
D. Watts and S. Strogatz. Collective dynamics of small world networks. Nature, (393):440--442, 1998.
 
23
B. Yang and H. Garcia-Molina. Designing a super-peer network. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03), March 2003.

CITED BY  13
 
 
 
 
 
 
 
 
 


REVIEW

"Maytham Hassan Safar, PhD : Reviewer"

Banaei-Kashani and Shahabi describe querical data networks (QDNs), which are large-scale, self-organizing, distributed query processing systems. As an example, they present a peer-to-peer network. The data is distributed among the nodes, and each   more...

Collaborative Colleagues:
Farnoush Banaei-Kashani: colleagues
Cyrus Shahabi: colleagues