research-article

Unsupervised, Efficient and Semantic Expertise Retrieval

Authors:

Christophe Van Gysel,

Maarten de Rijke,

Marcel WorringAuthors Info & Claims

WWW '16: Proceedings of the 25th International Conference on World Wide Web

Pages 1069 - 1079

https://doi.org/10.1145/2872427.2882974

Published: 11 April 2016 Publication History

Abstract

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.

References

[1]

The knowledge-based economy. Techn. report, Organisation for Economic Co-operation and Development, 1996.

[2]

P. Bailey, A. P. De Vries, N. Craswell, and I. Soboroff. Overview of the TREC 2007 enterprise track. In TREC, 2007.

[3]

K. Balog. People Search in the Enterprise. PhD thesis, University of Amsterdam, 2008.

Digital Library

[4]

K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In SIGIR, pages 43--50, 2006.

Digital Library

[5]

K. Balog, L. Azzopardi, and M. de Rijke. A language modeling framework for expert finding. IPM, 45:1--19, 2009.

Digital Library

[6]

K. Balog, Y. Fang, M. de Rijke, P. Serdyukov, and L. Si. Expertise retrieval. Found. & Tr. in Information Retrieval, 6 (2-3):127--256, 2012.

Digital Library

[7]

I. Becerra-Fernandez. Role of artificial intelligence technologies in the implementation of People-Finder knowledge management systems. Knowledge-Based Systems, 13(5):315--320, 2000.

Digital Library

[8]

Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. JMLR, 3:1137--1155, 2003.

Digital Library

[9]

Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. JSTOR, pages 289--300, 1995.

[10]

R. Berendsen, M. de Rijke, K. Balog, T. Bogers, and A. van den Bosch. On the assessment of expertise profiles. JASIST, 64(10): 2024--2044, 2013.

[11]

L. Bottou. Large-scale machine learning with stochastic gradient descent. In COMPSTAT, pages 177--186. Springer, 2010.

[12]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML, pages 89--96, 2005.

Digital Library

[13]

Y. Cao, J. Liu, S. Bao, and H. Li. Research on Expert Search at Enterprise Track of TREC 2005. In TREC, pages 2--5, 2005.

[14]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. JMLR, 12(Aug):2493--2537, 2011.

Digital Library

[15]

N. Craswell, D. Hawking, A.-M. Vercoustre, and P. Wilkins. P@noptic expert: Searching for experts not just for documents. In Ausweb Poster Proceedings, pages 21--25, 2001.

[16]

N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the TREC 2005 enterprise track. In TREC, 2005.

[17]

T. H. Davenport and L. Prusak. Working Knowledge. Harvard Business Review Press, 1998.

[18]

S. C. Deerwester, S. T. Dumais, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 1990.

[19]

G. Demartini, J. Gaugaz, and W. Nejdl. A vector space model for ranking entities and its application to expert search. In ECIR, pages 189--201. Springer, 2009.

Digital Library

[20]

L. Deng, X. He, and J. Gao. Deep stacking networks for information retrieval. In ICASSP, pages 3153--3157, 2013.

[21]

H. Fang and C. Zhai. Probabilistic models for expert finding. In ECIR, pages 418--430, Berlin, Heidelberg, 2007. Springer-Verlag.

Digital Library

[22]

Y. Fang and A. Godavarthy. Modeling the dynamics of personal expertise. In SIGIR, pages 1107--1110, 2014.

Digital Library

[23]

Y. Fang, L. Si, and A. P. Mathur. Discriminative models of integrating document evidence and document-candidate associations for expert search. In SIGIR, pages 683--690, 2010.

Digital Library

[24]

K. Fatahalian, J. Sugerman, and P. Hanrahan. Understanding the efficiency of gpu algorithms for matrix-matrix multiplication. In SIGGRAPH HWWS, pages 133--137. ACM, 2004.

Digital Library

[25]

X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249--256, 2010.

[26]

G. E. Hinton. Learning distributed representations of concepts. In 8th Annual Conference of the Cognitive Science Society, volume 1, page 12, Amherst, MA, 1986.

[27]

T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57. ACM, 1999.

Digital Library

[28]

P.-s. Huang, N. M. A. Urbana, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, pages 2333--2338, 2013.

Digital Library

[29]

P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, pages 604--613. ACM, 1998.

Digital Library

[30]

M. Karimzadehgan and C. Zhai. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In SIGIR, pages 323--330. ACM, 2010.

Digital Library

[31]

R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language models. In ICML, pages 595--603, 2014.

[32]

J. Kruger and D. Dunning. Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. J. Personality and Social Psych., 77 (6):1121, 1999.

[33]

J. Krüger and R. Westermann. Linear algebra operators for gpu implementation of numerical algorithms. ACM Transactions on Graphics, 22(3):908--916, 2003.

Digital Library

[34]

H. Li and J. Xu. Semantic matching in search. Found. & Tr. in Information Retrieval, 7(5):343--469, June 2014.

Digital Library

[35]

T.-Y. Liu. Learning to Rank for Information Retrieval. Springer, 2011.

[36]

C. MacDonald and I. Ounis. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM, pages 387--396, 2006.

Digital Library

[37]

C. Macdonald and I. Ounis. Expert search evaluation by supporting documents. In ECIR, pages 555--563. Springer, 2008.

Digital Library

[38]

M. T. Maybury. Expert finding systems. Techn. Report MTR-06B000040, MITRE, 2006.

[39]

D. W. McDonald and M. S. Ackerman. Expertise recommender. In CSCW, pages 231--240, 2000.

Digital Library

[40]

T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. In Interspeech, pages 1045--1048, 2010.

[41]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.

Digital Library

[42]

T. Mikolov, G. Corrado, K. Chen, and J. Dean. Efficient estimation of word representations in vector space. arXiv 1301.3781, 2013.

[43]

A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. In ICML, pages 641--648, 2007.

Digital Library

[44]

A. Mnih and G. Hinton. A scalable hierarchical distributed language model. In NIPS, pages 1081--1088, 2008.

Digital Library

[45]

A. Mnih and K. Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS, pages 2265--2273, 2013.

Digital Library

[46]

G. Montavon, G. B. Orr, and K.-R. Müller. Neural Networks: Tricks of the Trade. Springer, 2012.

Digital Library

[47]

C. Moreira, B. Martins, and P. Calado. Using rank aggregation for expert search in academic digital libraries. In Simpósio de Informática, INForum, pages 1--10, 2011.

[48]

J. Pennington, R. Socher, and C. D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP, pages 1532--1543, 2014.

[49]

D. Petkova and W. B. Croft. Hierarchical language models for expert finding in enterprise corpora. In ICTAI '06, pages 599?606, 2006.

Digital Library

[50]

W. W. Powell and K. Snellman. The knowledge economy. Annual review of sociology, pages 199--220, 2004.

[51]

D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by back propagation. In Parallel Distributed Processing, pages 318--362. MIT Press, 1986.

Digital Library

[52]

J. Rybak, K. Balog, and K. Nørvåg. Temporal expertise profiling. In ECIR, pages 540--546. Springer, 2014.

Digital Library

[53]

R. Salakhutdinov and G. Hinton. Semantic hashing. Int. J. Approximate Reasoning, 50(7):969--978, 2009.

Digital Library

[54]

P. Serdyukov and D. Hiemstra. Modeling documents as mixtures of persons for expert finding. In ECIR, pages 309--320. Springer, 2008.

Digital Library

[55]

P. Serdyukov, H. Rode, and D. Hiemstra. Modeling multi-step relevance propagation for expert finding. In CIKM, pages 1133--1142, 2008.

Digital Library

[56]

C. Shannon. A mathematical theory of communication. Bell System Technical J., 27:379--423, 623--656, 1948.

[57]

J. A. Shaw, E. A. Fox, J. A. Shaw, and E. A. Fox. Combination of multiple searches. In TREC, pages 243--252, 1994.

[58]

Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM, pages 101--110, 2014.

Digital Library

[59]

M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, pages 623--632. ACM, 2007.

Digital Library

[60]

P. Sorg and P. Cimiano. Finding the right expert: Discriminative models for expert retrieval. In KDIR, pages 190--199, 2011.

[61]

TREC. Enterprise Track, 2005--2008.

[62]

D. van Dijk, M. Tsagkias, and M. de Rijke. Early detection of topical expertise in community question and answering. In SIGIR, 2015.

Digital Library

[63]

V. Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998.

[64]

M. D. Zeiler. Adadelta: An adaptive learning rate method. CoRR, abs/1212.5701, 2012.

Cited By

Ji YZhang SHan FCui RJiang T(2024)The Sustainable Innovation of AI: Text Mining the Core Capabilities of Researchers in the Digital Age of Industry 4.0Sustainability10.3390/su1617776716:17(7767)Online publication date: 6-Sep-2024
https://doi.org/10.3390/su16177767
Moon SKerr GSilavong FMoran S(2024)API-Miner: an API-to-API Specification Recommendation EngineProceedings of the 1st IEEE/ACM Workshop on Software Engineering Challenges in Financial Firms10.1145/3643665.3648049(9-16)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643665.3648049
Moon SKerr GSilavong FMoran S(2024)A probabilistic model for API contract specification retrieval focusing on the openAPI standardData Mining and Knowledge Discovery10.1007/s10618-024-01073-439:1Online publication date: 13-Nov-2024
https://dl.acm.org/doi/10.1007/s10618-024-01073-4
Show More Cited By

Index Terms

Unsupervised, Efficient and Semantic Expertise Retrieval
1. Information systems
  1. Information retrieval

Recommendations

Unsupervised Semantic Generative Adversarial Networks for Expert Retrieval
WWW '19: The World Wide Web Conference

Sources in computer-based collaborative systems such as webpages can help employees to connect and cooperate with each other. It is natural to enable the systems to look not only for documents but also for experts. In this paper, we study the problem of ...
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very ...
Ad hoc retrieval via entity linking and semantic similarity

Semantic search has emerged as a possible way for addressing the challenges of traditional keyword-based retrieval systems such as the vocabulary gap between the query and document spaces. In this paper, we propose a novel semantic retrieval framework ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '16: Proceedings of the 25th International Conference on World Wide Web

April 2016

1482 pages

ISBN:9781450341431

General Chairs:
Jacqueline Bourdeau
Tele-university (TELUQ), Montreal, QC, Canada
,
Jim A. Hendler
Rensselaer Polytechnic Institute, Troy, NY, USA
,
Roger Nkambou Nkambou
Université du Québec à Montréal, Montreal, QC, Canada
,
Program Chairs:
Ian Horrocks
University of Oxford, UK
,
Ben Y. Zhao
University of California at Santa Barbara, CA, USA

Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 11 April 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '16

Sponsor:

IW3C2

WWW '16: 25th International World Wide Web Conference

April 11 - 15, 2016

Québec, Montréal, Canada

Acceptance Rates

WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
401
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ji YZhang SHan FCui RJiang T(2024)The Sustainable Innovation of AI: Text Mining the Core Capabilities of Researchers in the Digital Age of Industry 4.0Sustainability10.3390/su1617776716:17(7767)Online publication date: 6-Sep-2024
https://doi.org/10.3390/su16177767
Moon SKerr GSilavong FMoran S(2024)API-Miner: an API-to-API Specification Recommendation EngineProceedings of the 1st IEEE/ACM Workshop on Software Engineering Challenges in Financial Firms10.1145/3643665.3648049(9-16)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643665.3648049
Moon SKerr GSilavong FMoran S(2024)A probabilistic model for API contract specification retrieval focusing on the openAPI standardData Mining and Knowledge Discovery10.1007/s10618-024-01073-439:1Online publication date: 13-Nov-2024
https://dl.acm.org/doi/10.1007/s10618-024-01073-4
Wang YLiu JXu XKe XWu TGou X(2023)Efficient and Effective Academic Expert Finding on Heterogeneous Graphs through (k, 𝒫)-Core based EmbeddingACM Transactions on Knowledge Discovery from Data10.1145/357836517:6(1-35)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3578365
Rostami PShakery A(2023)A deep learning-based expert finding method to retrieve agile software teams from CQAsInformation Processing & Management10.1016/j.ipm.2022.10314460:2(103144)Online publication date: Mar-2023
https://doi.org/10.1016/j.ipm.2022.103144
Kang YDu HForkan AJayaraman PAryani ASellis T(2023)ExpFinder: A hybrid model for expert finding from text-based expertise dataExpert Systems with Applications10.1016/j.eswa.2022.118691211(118691)Online publication date: Jan-2023
https://doi.org/10.1016/j.eswa.2022.118691
Wu YLu BTian LLiang S(2022)Learning to Co-Embed Queries and DocumentsElectronics10.3390/electronics1122369411:22(3694)Online publication date: 11-Nov-2022
https://doi.org/10.3390/electronics11223694
Xu XLiu JWang YKe X(2022)Academic Expert Finding via $(k, \mathcal{P})$-Core based Embedding over Heterogeneous Graphs2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00030(338-351)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00030
Liu JChu XWang YWang M(2022)Deep Text Retrieval Models based on DNN, CNN, RNN and Transformer: A review2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016379(391-400)Online publication date: 26-Nov-2022
https://doi.org/10.1109/CCIS57298.2022.10016379
Fallahnejad ZBeigy H(2022)Attention-based skill translation models for expert findingExpert Systems with Applications10.1016/j.eswa.2021.116433(116433)Online publication date: Jan-2022
https://doi.org/10.1016/j.eswa.2021.116433
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten