|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ABSTRACT
Peer-to-peer Data Networks (PDNs) are large-scale, self-organizing, distributed query processing systems. Familiar examples of PDN are peer-to-peer file-sharing networks, which support exact-match search queries to locate user-requested files. In this paper, we formalize the more general problem of <i>similarity-search</i> in PDNs, and propose a <i>family</i> of distributed access methods, termed <i>Small-World Access Methods (SWAM)</i>, for efficient execution of various similarity-search queries, namely exact-match, range, and k-nearest-neighbor queries. Unlike its predecessors, i.e., LH* and DHTs, SWAM does not control the assignment of data objects to PDN nodes; each node autonomously stores its own data. Besides, SWAM supports all similarity-search queries on multiple attributes. SWAM guarantees that the query object will be found (if it exists in the network) in average time logarithmically proportional to the network size. Moreover, once the query object is found, all the similar objects would be in its proximate network neighborhood and hence enabling efficient range and k-nearest-neighbor queries. As a specific instance of SWAM, we propose <i>SWAM-V</i>, a Voronoi-based SWAM that indexes PDNs with multi-attribute data objects. For a PDN with <i>N</i> nodes SWAM-V has query time, communication cost, and computation cost of <i>O</i>(log <i>N</i>) for exact-match queries, and <i>O</i>(log <i>N</i> + <b>s</b><i>N</i>) and <i>O</i>(log <i>N</i> + <b>k</b>) for range queries (with selectivity <b>s</b>) and <b>k</b>NN queries, respectively. Our experiments show that SWAM-V consistently outperforms a similarity-search enabled version of CAN in query time and communication cost by a factor of 2 to 3. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
Additional Classification:
Keywords:
REVIEW
"Maytham Hassan Safar, PhD : Reviewer"
Banaei-Kashani and Shahabi describe querical data networks (QDNs), which are large-scale, self-organizing, distributed query processing systems. As an example, they present a peer-to-peer network. The data is distributed among the nodes, and each
more...
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||