ABSTRACT
Record label companies would like to identify potential artists as early as possible in their careers, before other companies approach the artists with competing contracts. The vast number of candidates makes the process of identifying the ones with high success potential time consuming and laborious. This paper demonstrates how datamining of P2P query strings can be used in order to mechanize most of this detection process. Using a unique intercepting system over the Gnutella network, we were able to capture an unprecedented amount of geographically identified (geo-aware) queries, allowing us to investigate the diffusion of music related queries in time and space. Our solution is based on the observation that emerging artists, especially rappers, have a discernible stronghold of fans in their hometown area, where they are able to perform and market their music. In a file sharing network, this is reflected as a delta function spatial distribution of content queries. Using this observation, we devised a detection algorithm for emerging artists, that looks for performers with sharp increase in popularity in a small geographic region though still unnoticable nation wide. The algorithm can suggest a short list of artists with breakthrough potential, from which we showed that about 30% translate the potential to national success.
- L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small-world networks. PNAS, 97(21):11149--11152, Sept. 2000.Google ScholarCross Ref
- S. Bhattacharjee, R. D. Gopal, K. Lertwachara, and J. R. Marsden. Using P2P sharing activity to improve business decision making: proof of concept for estimating product life-cycle. Electronic Commerce Research and Applications, 4(1):14--20, 2005. Google ScholarDigital Library
- P. Domingos and M. Richardson. Mining the network value of customers. In KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 57--66. ACM, 2001. Google ScholarDigital Library
- T. Garber, J. Goldenberg, B. Libai, and E. Muller. From density to destiny: Using spatial dimension of sales data for early prediction of new product success. Marketing Science, 23(3):419--428, 2004. Google ScholarDigital Library
- A. S. Gish, Y. Shavitt, and T. Tankel. Geographical statistics and characteristics of p2p query strings. In The 6th International Workshop on Peer-to-Peer Systems (IPTPS'07), Feb. 2007.Google Scholar
- M. S. Granovetter. The strength of weak ties. The American Journal of Sociology, 78(6):1360--1380, 1973.Google ScholarCross Ref
- C. X. Ling and C. Li. Data mining for direct marketing: Problems and solutions. In Knowledge Discovery and Data Mining, pages 73--79, 1998.Google Scholar
- G. N. Noren, R. Orre, and A. Bate. A hit-miss model for duplicate detection in the who drug safety database. In KDD, pages 459--468. ACM, 2005. Google ScholarDigital Library
- A. H. Rasti, D. Stutzbach, and R. Rejaie. On the long-term evolution of the two-tier gnutella overlay. In IEEE Global Internet Symposium, Barcelona, Spain, Apr. 2006.Google ScholarCross Ref
- J. Shepherd. Ghost rider fallout haunts Mistah F.A.B., Mar. 2007. Featured on The VIBE Magazine website. Last Accessed December 2007.Google Scholar
- F. Usama, P.-S. Gregory, and S. Padhraic. The kdd process for extracting useful knowledge from volumes of data. In Communication of the ACM, volume 29, pages 27--34, Nov. 1996. Google ScholarDigital Library
- D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks. Nature, 393:440--442, June 1998.Google ScholarCross Ref
Index Terms
- Spotting out emerging artists using geo-aware analysis of P2P query strings
Recommendations
Improving the performance of P2P networks using SPIS with Query Filtering
The need of simple storage access, bandwidth and processing power of computers located at the edges of the network is highly required in advanced computing. Peer to Peer P2P satisfies the functional requirements of several applications. The searching of ...
A Novel Top-k Query Scheme in Unstructured P2P Networks
CIT '09: Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02There're two major problems in the unstructured p2p systems, one is their heavy network traffic; the other is the problem of query effectiveness, which is caused mainly by high numbers of query answers, many of which are irrelevant for users. A ...
Comments