|
ABSTRACT
An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, number of incoming links and URL form. Especially the URL form proved to be a good predictor. Using URL form priors we found over 70% of all entry pages at rank 1, and up to 89% in the top 10. Non-content features can easily be embedded in a language model framework as a prior probability.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
N. Craswell, D. Hawking, R. Wilkinson, and M. Wu. TREC10 web and interactive tracks at CSIRO. In Voorhees and Harman TREC10, pages 261--268.
|
| |
8
|
F. Crivellari and M. Melucci. Web document retrieval using passage retrieval, connectivity information, and automatic link weighting - TREC-9 report. In Voorhees and Harman, pages 611--620.
|
| |
9
|
W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors. Proceedings of the 23rd Annual International Conference on Research and Development in Information Retrieval, 2000.
|
 |
10
|
|
 |
11
|
|
| |
12
|
C. Gurrin and A. F. Smeaton. Dublin city university experiments in connectivity analysis for TREC-9. In Voorhees and Harman TREC9, pages 179--188.
|
| |
13
|
D. Hawking. Overview of the TREC-9 web track. In Voorhees and Harman TREC9, pages 87--102.
|
| |
14
|
D. Hawking and N. Craswell. Overview of the TREC-2001 web track. In Voorhees and Harman TREC10, pages 25--31.
|
| |
15
|
D. Hawking, E. voorhees, N. Craswell, and P. Bailey. Overview of the TREC-8 web track. In Voorhees and Harman TREC8, pages 131--148.
|
| |
16
|
D. Hiemstra. Using language models for information retrieval. PhD thesis, Centre for Telematics and Information Technology, University of Twente, 2001.
|
| |
17
|
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Voorhees and Harman TREC7, pages 227--238.
|
 |
18
|
|
| |
19
|
W. Kraaij, M. Spitters, and M. van der Heijden. Combining a mixture language model and naive bayes for multi-document summarisation. In Working notes of the DUC2001 workshop (SIGIR2001), New Orleans, 2001.
|
| |
20
|
W. Kraaij and T. Westerveld. TNO/UT at TREC-9: How different are web documents? In Voorhees and Harman TREC9, pages 665--671.
|
| |
21
|
J. Lafferty and C. Zhai. Probabilistic IR models based on document and query generation. In J. Callan, B. Croft, and J. Lafferty, editors, Proceedings of the workshop on Language Modeling and Informati on Retrieval, 2001.
|
| |
22
|
|
| |
23
|
D. R. H. Miller, T. Leek, and R. M. Schwartz. BBN at TREC-7: using hidden markov models for information retrieval. In Voorhees and Harman TREC7, pages 133--142.
|
| |
24
|
|
 |
25
|
|
 |
26
|
|
| |
27
|
D.-Y. Ra, E.-K. Park, and J.-S. Jang. Yonsei/etri at TREC-10: Utilizing web document properties. In Voorhees and Harman TREC10, pages 643--650.
|
| |
28
|
S. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.
|
| |
29
|
|
| |
30
|
|
| |
31
|
J. Savoy and Y. Rasolofo. Report on the TREC-9 experiment: Link-based retrieval and distributed collections. In Voorhees and Harman TREC9, pages 579--588.
|
| |
32
|
J. Savoy and Y. Rasolofo. Report on the TREC-10 experiment: Distributed collections and entrypage searching. In Voorhees and Harman TREC10, pages 578--590.
|
 |
33
|
|
| |
34
|
E. M. Voorhees and D. K. Harman, editors. The Seventh Text Retrieval Conference (TREC7), volume 7. National Institute of Standards and Technology, NIST, 1999.
|
| |
35
|
E. M. Voorhees and D. K. Harman, editors. The Eighth Text Retrieval Conference (TREC8), volume 8. National Institute of Standards and Technology, NIST, 2000.
|
| |
36
|
E. M. Voorhees and D. K. Harman, editors. The Ninth Text Retrieval Conference (TREC9), volume 9. National Institute of Standards and Technology, NIST, 2001.
|
| |
37
|
E. M. Voorhees and D. K. Harman, editors. The Tenth Text Retrieval Conference (TREC-2001), volume 10. National Institute of Standards and Technology, NIST, 2002.
|
| |
38
|
T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, URL's and anchors. In Voorhees and Harman TREC10, pages 52--61.
|
| |
39
|
W. Xi and E. A. Fox. Machine learning approaches for homepage finding tasks at TREC-10. In Voorhees and Harman TREC10, pages 633--642.
|
 |
40
|
|
CITED BY 36
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
Yunyao Li , Rajasekar Krishnamurthy , Shivakumar Vaithyanathan , H. V. Jagadish, Getting work done on the web: supporting transactional queries, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
Huaiyu Zhu , Sriram Raghavan , Shivakumar Vaithyanathan , Alexander Löser, Navigating the intranet with high precision, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ronald Fagin , Ravi Kumar , Kevin S. McCurley , Jasmine Novak , D. Sivakumar , John A. Tomlin , David P. Williamson, Searching the workplace web, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
|
|
|
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Constructing reality
Proceedings of the 11th annual international conference on Systems documentation
Douglas A. Powell
, Norman R. Ball
, Mansel W. Griffiths
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
|