skip to main content
10.1145/2124295.2124364acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Adding semantics to microblog posts

Published: 08 February 2012 Publication History

Abstract

Microblogs have become an important source of information for the purpose of marketing, intelligence, and reputation management. Streams of microblogs are of great value because of their direct and real-time nature. Determining what an individual microblog post is about, however, can be non-trivial because of creative language usage, the highly contextualized and informal nature of microblog posts, and the limited length of this form of communication. We propose a solution to the problem of determining what a microblog post is about through semantic linking: we add semantics to posts by automatically identifying concepts that are semantically related to it and generating links to the corresponding Wikipedia articles. The identified concepts can subsequently be used for, e.g., social media mining, thereby reducing the need for manual inspection and selection. Using a purpose-built test collection of tweets, we show that recently proposed approaches for semantic linking do not perform well, mainly due to the idiosyncratic nature of microblog posts. We propose a novel method based on machine learning with a set of innovative features and show that it is able to achieve significant improvements over all other methods, especially in terms of precision.

References

[1]
F. Abel, Q. Gao, G.-J. Houben, and K. Tao. Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web. In phESWC '11, 2011.
[2]
t al.(2010)Amigó, Artiles, Gonzalo, Spina, Liu, and Corujo}CLEF:2010:amigoE. Amigó, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo. WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In ph2nd Web People Search Evaluation Workshop (WePS 2010), CLEF 2010 Conference, 2010.
[3]
E. Benson, A. Haghighi, and R. Barzilay. Event discovery in social media feeds. In phACL '11, 2011.
[4]
D. Boyd, S. Golder, and G. Lotan. Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In phHawaii Intern. Conf. on System Sciences, 2010.
[5]
L. Breiman. Random forests. phMach. Learn., 45 (1): 5--32, 2001.
[6]
M. Bron, B. Huurnink, and M. de Rijke. Linking archives using document enrichment and term selection. In phResearch and Advanced Technology for Digital Libraries. Springer Berlin / Heidelberg, 2011.
[7]
C. J. C. Burges, K. M. Svore, P. N. Bennett, A. Pastusiak, and Q. Wu. Learning to rank using an ensemble of lambda-gradient models. phJournal of Machine Learning Research - Proceedings Track, 14: 25--35, 2011.
[8]
K. W. Church and W. A. Gale. Inverse document frequency (IDF): A measure of deviations from poisson. In phProc.\ Third Workshop on Very Large Corpora, 1995.
[9]
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In phEMNLP '07, 2007.
[10]
S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. Tomlin, and J. Zien. Semtag and seeker: Bootstrapping the semantic web via automated semantic annotation. In phWWW '03, 2003.
[11]
P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In phCIKM '10, 2010.
[12]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. phAnnals of Statistics, 29: 1189--1232, 2001.
[13]
T. Hastie, R. Tibshirani, and J. H. Friedman. phThe Elements of Statistical Learning. Springer, 2003.
[14]
J. He, M. de Rijke, M. Sevenster, R. van Ommering, and Y. Qian. Generating links to background knowledge: A case study using narrative radiology reports. In phCIKM '11, 2011.
[15]
D. W. C. Huang, Y. Xu, A. Trotman, and S. Geva. Overview of INEX 2007 Link the Wiki Track. In phINEX '07, 2007.
[16]
J. Huang, K. M. Thornton, and E. N. Efthimiadis. Conversational tagging in twitter. In phHT '10, 2010.
[17]
G. Inches, M. J. Carman, and F. Crestani. Statistics of online user-generated short documents. In phECIR '10, 2010.
[18]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In phWWW '10, 2010.
[19]
D. Laniado and P. Mika. Making sense of twitter. In phISWC '10, 2010.
[20]
X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing named entities in tweets. In phACL: HLT '11, 2011.
[21]
C. D. Manning, P. Raghavan, and H. Schütze. phIntroduction to Information Retrieval. Cambridge University Press, 2008.
[22]
K. Massoudi, E. Tsagkias, M. de Rijke, and W. Weerkamp. Incorporating query expansion and quality indicators in searching microblog posts. In phECIR '11, 2011.
[23]
E. Meij, M. Bron, B. Huurnink, L. Hollink, and M. de Rijke. Learning semantic query suggestions. In phISWC '09, 2009.
[24]
E. Meij, D. Trieschnigg, M. de Rijke, and W. Kraaij. Conceptual language models for domain-specific retrieval. phInf. Process. Manage., 46 (4): 448--469, 2010.
[25]
E. Meij, M. Bron, L. Hollink, B. Huurnink, and M. de Rijke. Mapping queries to the Linking Open Data cloud: A case study using DBpedia. phWeb Semantics: Science, Services and Agents on the World Wide Web, 9 (4): 418 -- 433, 2011.
[26]
P. N. Mendes, A. Passant, P. Kapanipathi, and A. P. Sheth. Linked open social signals. In phWI-IAT '10, 2010.
[27]
R. Mihalcea and A. Csomai. Wikify!: Linking documents to encyclopedic knowledge. In phCIKM '07, 2007.
[28]
D. Milne and I. H. Witten. Learning to link with Wikipedia. In phCIKM '08, 2008.
[29]
A. Mohan, Z. Chen, and K. Q. Weinberger. Web-search ranking with initialized gradient boosted regression trees. phJournal of Machine Learning Research - Proceedings Track, 14: 77--89, 2011.
[30]
D. Nadeau and S. Sekine. A survey of named entity recognition and classification. phLingvisticae Investigationes, 30 (1): 3--26, 2007.
[31]
B. O'Connor, R. Balasubramanyan, B. R. Routledge, and N. A. Smith. From tweets to polls: Linking text sentiment to public opinion time series. In phICWSM '10, 2010.
[32]
009)}Pear:2009:twitterPear Analytics. Twitter study -- August 2009, 2009. http://bit.ly/nYUJz7 {Online; accessed June 2011}.
[33]
J. A. Shaw and E. A. Fox. Combination of multiple searches. In phText REtrieval Conference, 1993.
[34]
C. G. M. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, and M. Worring. Adding semantics to detectors for video retrieval. phIEEE Transactions on Multimedia, 9 (5): 975--986, 2007.
[35]
E. Tsagkias, M. de Rijke, and W. Weerkamp. Linking online news and social media. In phFourth ACM Web Search and Data Mining (WSDM), Hong Kong, 2011.
[36]
W. Weerkamp, S. Carter, and M. Tsagkias. How people use twitter in different languages. In phWebSci '11, 2011.
[37]
M. J. Welch, U. Schonfeld, D. He, and J. Cho. Topical semantics of twitter links. In phWSDM '11, 2011.
[38]
J. X. Yu, L. Qin, and L. Chang. Keyword search in relational databases: A survey. phIEEE Data Eng. Bull. Special Issue on Keyword Search, 33 (1): 67--78, 2010.
[39]
Y. Zhou and B. W. Croft. Query performance prediction in web search environments. In phSIGIR '07, 2007.

Cited By

View all
  • (2024)DLRGeoTweet: A comprehensive social media geocoding corpus featuring fine-grained placesInformation Processing & Management10.1016/j.ipm.2024.10374261:4(103742)Online publication date: Jul-2024
  • (2024)How to generate popular post headlines on social media?AI Open10.1016/j.aiopen.2023.12.0025(1-9)Online publication date: 2024
  • (2023)Identifying Concepts in Software ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2023.326585549:7(3660-3674)Online publication date: Jul-2023
  • Show More Cited By

Index Terms

  1. Adding semantics to microblog posts

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining
    February 2012
    792 pages
    ISBN:9781450307475
    DOI:10.1145/2124295
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 February 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. microblogs
    2. semantic linking
    3. wikipedia

    Qualifiers

    • Research-article

    Conference

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)DLRGeoTweet: A comprehensive social media geocoding corpus featuring fine-grained placesInformation Processing & Management10.1016/j.ipm.2024.10374261:4(103742)Online publication date: Jul-2024
    • (2024)How to generate popular post headlines on social media?AI Open10.1016/j.aiopen.2023.12.0025(1-9)Online publication date: 2024
    • (2023)Identifying Concepts in Software ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2023.326585549:7(3660-3674)Online publication date: Jul-2023
    • (2022)TweetNERD - end to end entity linking benchmark for tweetsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600374(1419-1433)Online publication date: 28-Nov-2022
    • (2021)Conversational Entity Linking: Problem Definition and DatasetsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463258(2390-2397)Online publication date: 11-Jul-2021
    • (2021)Reddit entity linking datasetInformation Processing & Management10.1016/j.ipm.2020.10247958:3(102479)Online publication date: May-2021
    • (2021)A step further towards a consensus on linking tweets to WikipediaEvolutionary Intelligence10.1007/s12065-020-00549-816:6(1825-1840)Online publication date: 1-Feb-2021
    • (2021)Wikifying software artifactsEmpirical Software Engineering10.1007/s10664-020-09918-426:2Online publication date: 11-Mar-2021
    • (2020)Information extraction meets the Semantic WebSemantic Web10.3233/SW-18033311:2(255-335)Online publication date: 1-Jan-2020
    • (2020)Natural Language Processing for Social Media, Third EditionSynthesis Lectures on Human Language Technologies10.2200/S00999ED3V01Y202003HLT04613:2(1-219)Online publication date: 3-Apr-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media