skip to main content
10.1145/2983323.2983887acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Near Real-time Geolocation Prediction in Twitter Streams via Matrix Factorization Based Regression

Published:24 October 2016Publication History

ABSTRACT

Previous research on content-based geolocation in general has developed prediction methods via conducting pre-partitioning and applying classification methods. The input of these methods is the concatenation of individual tweets during a period of time. But unfortunately, these methods have some drawbacks. They discard the natural real-values properties of latitude and longitude as well as fail to capture geolocation in near real-time. In this work, we develop a novel generative content-based regression model via a matrix factorization technique to tackle the near real-time geolocation prediction problem. With this model, we aim to address a couple of un-answered questions. First, we prove that near real-time geolocation prediction can be accomplished if we leave out the concatenation. Second, we account the real-values properties of physical coordinates within a regression solution. We apply our model on Twitter datasets as an example to prove the effectiveness and generality. Our experimental results show that the proposed model, in the best scenario, outperforms a set of state-of-the-art regression models including Support Vector Machines and Factorization Machines by a reduction of the median localization error up to 79%.

References

  1. C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Eisenstein, A. Ahmed, and E. P. Xing. Sparse additive generative models of text. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1041--1048, 2011.Google ScholarGoogle Scholar
  3. J. Eisenstein, B. O'Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1277--1287. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Han, P. Cook, and T. Baldwin. Geolocation prediction in social media data by finding location indicative words. Proceedings of COLING 2012: Technical Papers, pages 1045--1062, 2012.Google ScholarGoogle Scholar
  6. B. Han, P. Cook, and T. Baldwin. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research, pages 451--500, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Hong, A. Ahmed, S. Gurumurthy, A. J. Smola, and K. Tsioutsiouliklis. Discovering geographical topics in the twitter stream. In Proceedings of the 21st international conference on World Wide Web, pages 769--778. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST), 3(3):57, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Roller, M. Speriosu, S. Rallapalli, B. Wing, and J. Baldridge. Supervised text-based geolocation using language models on an adaptive grid. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1500--1510. Association for Computational Linguistics, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Wing and J. Baldridge. Hierarchical discriminative classification for text-based geolocation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 336--348, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. P. Wing and J. Baldridge. Simple supervised document geolocation with geodesic grids. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 955--964. Association for Computational Linguistics, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Near Real-time Geolocation Prediction in Twitter Streams via Matrix Factorization Based Regression

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
            October 2016
            2566 pages
            ISBN:9781450340731
            DOI:10.1145/2983323

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 October 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper

            Acceptance Rates

            CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader