skip to main content
10.1145/2872427.2882974acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Unsupervised, Efficient and Semantic Expertise Retrieval

Published: 11 April 2016 Publication History

Abstract

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.

References

[1]
The knowledge-based economy. Techn. report, Organisation for Economic Co-operation and Development, 1996.
[2]
P. Bailey, A. P. De Vries, N. Craswell, and I. Soboroff. Overview of the TREC 2007 enterprise track. In TREC, 2007.
[3]
K. Balog. People Search in the Enterprise. PhD thesis, University of Amsterdam, 2008.
[4]
K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In SIGIR, pages 43--50, 2006.
[5]
K. Balog, L. Azzopardi, and M. de Rijke. A language modeling framework for expert finding. IPM, 45:1--19, 2009.
[6]
K. Balog, Y. Fang, M. de Rijke, P. Serdyukov, and L. Si. Expertise retrieval. Found. & Tr. in Information Retrieval, 6 (2-3):127--256, 2012.
[7]
I. Becerra-Fernandez. Role of artificial intelligence technologies in the implementation of People-Finder knowledge management systems. Knowledge-Based Systems, 13(5):315--320, 2000.
[8]
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. JMLR, 3:1137--1155, 2003.
[9]
Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. JSTOR, pages 289--300, 1995.
[10]
R. Berendsen, M. de Rijke, K. Balog, T. Bogers, and A. van den Bosch. On the assessment of expertise profiles. JASIST, 64(10): 2024--2044, 2013.
[11]
L. Bottou. Large-scale machine learning with stochastic gradient descent. In COMPSTAT, pages 177--186. Springer, 2010.
[12]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML, pages 89--96, 2005.
[13]
Y. Cao, J. Liu, S. Bao, and H. Li. Research on Expert Search at Enterprise Track of TREC 2005. In TREC, pages 2--5, 2005.
[14]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. JMLR, 12(Aug):2493--2537, 2011.
[15]
N. Craswell, D. Hawking, A.-M. Vercoustre, and P. Wilkins. P@noptic expert: Searching for experts not just for documents. In Ausweb Poster Proceedings, pages 21--25, 2001.
[16]
N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the TREC 2005 enterprise track. In TREC, 2005.
[17]
T. H. Davenport and L. Prusak. Working Knowledge. Harvard Business Review Press, 1998.
[18]
S. C. Deerwester, S. T. Dumais, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 1990.
[19]
G. Demartini, J. Gaugaz, and W. Nejdl. A vector space model for ranking entities and its application to expert search. In ECIR, pages 189--201. Springer, 2009.
[20]
L. Deng, X. He, and J. Gao. Deep stacking networks for information retrieval. In ICASSP, pages 3153--3157, 2013.
[21]
H. Fang and C. Zhai. Probabilistic models for expert finding. In ECIR, pages 418--430, Berlin, Heidelberg, 2007. Springer-Verlag.
[22]
Y. Fang and A. Godavarthy. Modeling the dynamics of personal expertise. In SIGIR, pages 1107--1110, 2014.
[23]
Y. Fang, L. Si, and A. P. Mathur. Discriminative models of integrating document evidence and document-candidate associations for expert search. In SIGIR, pages 683--690, 2010.
[24]
K. Fatahalian, J. Sugerman, and P. Hanrahan. Understanding the efficiency of gpu algorithms for matrix-matrix multiplication. In SIGGRAPH HWWS, pages 133--137. ACM, 2004.
[25]
X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249--256, 2010.
[26]
G. E. Hinton. Learning distributed representations of concepts. In 8th Annual Conference of the Cognitive Science Society, volume 1, page 12, Amherst, MA, 1986.
[27]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57. ACM, 1999.
[28]
P.-s. Huang, N. M. A. Urbana, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, pages 2333--2338, 2013.
[29]
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, pages 604--613. ACM, 1998.
[30]
M. Karimzadehgan and C. Zhai. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In SIGIR, pages 323--330. ACM, 2010.
[31]
R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language models. In ICML, pages 595--603, 2014.
[32]
J. Kruger and D. Dunning. Unskilled and unaware of it: how difficulties in recognizing one's own incompetence lead to inflated self-assessments. J. Personality and Social Psych., 77 (6):1121, 1999.
[33]
J. Krüger and R. Westermann. Linear algebra operators for gpu implementation of numerical algorithms. ACM Transactions on Graphics, 22(3):908--916, 2003.
[34]
H. Li and J. Xu. Semantic matching in search. Found. & Tr. in Information Retrieval, 7(5):343--469, June 2014.
[35]
T.-Y. Liu. Learning to Rank for Information Retrieval. Springer, 2011.
[36]
C. MacDonald and I. Ounis. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM, pages 387--396, 2006.
[37]
C. Macdonald and I. Ounis. Expert search evaluation by supporting documents. In ECIR, pages 555--563. Springer, 2008.
[38]
M. T. Maybury. Expert finding systems. Techn. Report MTR-06B000040, MITRE, 2006.
[39]
D. W. McDonald and M. S. Ackerman. Expertise recommender. In CSCW, pages 231--240, 2000.
[40]
T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur. Recurrent neural network based language model. In Interspeech, pages 1045--1048, 2010.
[41]
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013.
[42]
T. Mikolov, G. Corrado, K. Chen, and J. Dean. Efficient estimation of word representations in vector space. arXiv 1301.3781, 2013.
[43]
A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. In ICML, pages 641--648, 2007.
[44]
A. Mnih and G. Hinton. A scalable hierarchical distributed language model. In NIPS, pages 1081--1088, 2008.
[45]
A. Mnih and K. Kavukcuoglu. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS, pages 2265--2273, 2013.
[46]
G. Montavon, G. B. Orr, and K.-R. Müller. Neural Networks: Tricks of the Trade. Springer, 2012.
[47]
C. Moreira, B. Martins, and P. Calado. Using rank aggregation for expert search in academic digital libraries. In Simpósio de Informática, INForum, pages 1--10, 2011.
[48]
J. Pennington, R. Socher, and C. D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP, pages 1532--1543, 2014.
[49]
D. Petkova and W. B. Croft. Hierarchical language models for expert finding in enterprise corpora. In ICTAI '06, pages 599?606, 2006.
[50]
W. W. Powell and K. Snellman. The knowledge economy. Annual review of sociology, pages 199--220, 2004.
[51]
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by back propagation. In Parallel Distributed Processing, pages 318--362. MIT Press, 1986.
[52]
J. Rybak, K. Balog, and K. Nørvåg. Temporal expertise profiling. In ECIR, pages 540--546. Springer, 2014.
[53]
R. Salakhutdinov and G. Hinton. Semantic hashing. Int. J. Approximate Reasoning, 50(7):969--978, 2009.
[54]
P. Serdyukov and D. Hiemstra. Modeling documents as mixtures of persons for expert finding. In ECIR, pages 309--320. Springer, 2008.
[55]
P. Serdyukov, H. Rode, and D. Hiemstra. Modeling multi-step relevance propagation for expert finding. In CIKM, pages 1133--1142, 2008.
[56]
C. Shannon. A mathematical theory of communication. Bell System Technical J., 27:379--423, 623--656, 1948.
[57]
J. A. Shaw, E. A. Fox, J. A. Shaw, and E. A. Fox. Combination of multiple searches. In TREC, pages 243--252, 1994.
[58]
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM, pages 101--110, 2014.
[59]
M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, pages 623--632. ACM, 2007.
[60]
P. Sorg and P. Cimiano. Finding the right expert: Discriminative models for expert retrieval. In KDIR, pages 190--199, 2011.
[61]
TREC. Enterprise Track, 2005--2008.
[62]
D. van Dijk, M. Tsagkias, and M. de Rijke. Early detection of topical expertise in community question and answering. In SIGIR, 2015.
[63]
V. Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998.
[64]
M. D. Zeiler. Adadelta: An adaptive learning rate method. CoRR, abs/1212.5701, 2012.

Cited By

View all
  • (2024)The Sustainable Innovation of AI: Text Mining the Core Capabilities of Researchers in the Digital Age of Industry 4.0Sustainability10.3390/su1617776716:17(7767)Online publication date: 6-Sep-2024
  • (2024)API-Miner: an API-to-API Specification Recommendation EngineProceedings of the 1st IEEE/ACM Workshop on Software Engineering Challenges in Financial Firms10.1145/3643665.3648049(9-16)Online publication date: 16-Apr-2024
  • (2024)A probabilistic model for API contract specification retrieval focusing on the openAPI standardData Mining and Knowledge Discovery10.1007/s10618-024-01073-439:1Online publication date: 13-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '16: Proceedings of the 25th International Conference on World Wide Web
April 2016
1482 pages
ISBN:9781450341431

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 11 April 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. expertise retrieval
  2. language models
  3. semantic matching

Qualifiers

  • Research-article

Conference

WWW '16
Sponsor:
  • IW3C2
WWW '16: 25th International World Wide Web Conference
April 11 - 15, 2016
Québec, Montréal, Canada

Acceptance Rates

WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The Sustainable Innovation of AI: Text Mining the Core Capabilities of Researchers in the Digital Age of Industry 4.0Sustainability10.3390/su1617776716:17(7767)Online publication date: 6-Sep-2024
  • (2024)API-Miner: an API-to-API Specification Recommendation EngineProceedings of the 1st IEEE/ACM Workshop on Software Engineering Challenges in Financial Firms10.1145/3643665.3648049(9-16)Online publication date: 16-Apr-2024
  • (2024)A probabilistic model for API contract specification retrieval focusing on the openAPI standardData Mining and Knowledge Discovery10.1007/s10618-024-01073-439:1Online publication date: 13-Nov-2024
  • (2023)Efficient and Effective Academic Expert Finding on Heterogeneous Graphs through (k, 𝒫)-Core based EmbeddingACM Transactions on Knowledge Discovery from Data10.1145/357836517:6(1-35)Online publication date: 22-Mar-2023
  • (2023)A deep learning-based expert finding method to retrieve agile software teams from CQAsInformation Processing & Management10.1016/j.ipm.2022.10314460:2(103144)Online publication date: Mar-2023
  • (2023)ExpFinder: A hybrid model for expert finding from text-based expertise dataExpert Systems with Applications10.1016/j.eswa.2022.118691211(118691)Online publication date: Jan-2023
  • (2022)Learning to Co-Embed Queries and DocumentsElectronics10.3390/electronics1122369411:22(3694)Online publication date: 11-Nov-2022
  • (2022)Academic Expert Finding via $(k, \mathcal{P})$-Core based Embedding over Heterogeneous Graphs2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00030(338-351)Online publication date: May-2022
  • (2022)Deep Text Retrieval Models based on DNN, CNN, RNN and Transformer: A review2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016379(391-400)Online publication date: 26-Nov-2022
  • (2022)Attention-based skill translation models for expert findingExpert Systems with Applications10.1016/j.eswa.2021.116433(116433)Online publication date: Jan-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media