skip to main content
10.1145/2786451.2786482acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
short-paper

Big Data?: Big Issues Degradation in Longitudinal Data and Implications for Social Sciences

Authors Info & Claims
Published:28 June 2015Publication History

ABSTRACT

This article analyzes the issue of degradation of data accuracy in large-scale longitudinal data sets. Recent research points to a number of issues with large-scale data, including problems of reliability, accuracy and quality over time. Simultaneously, large-scale data is increasingly being utilized in the social sciences. As scholars work to produce theoretically grounded research utilized "small-scale" methods, it is important for researchers to better understand the critical issues associated with the analysis of large-scale data. In order to illustrate the issues associated with this type of research, a case study analysis of archival Internet data is presented focusing on the issues of degradation of data accuracy over time. Suggestions for future studies are given.

References

  1. Tien, J. M. Big data: Unleashing information. Journal of Systems Science and Systems Engineering, 22, 2 2013), 127--151.Google ScholarGoogle ScholarCross RefCross Ref
  2. Armstrong, K. Big data: a revolution that will transform how we live, work, and think. Information, Communication & Society, 17, 10 2014), 1300--1302.Google ScholarGoogle Scholar
  3. Lazer, D., Kennedy, R., King, G. and Vespignani, A. Big data. The parable of Google Flu: traps in big data analysis. Science, 343, 6176 (Mar 14 2014), 1203--1205.Google ScholarGoogle Scholar
  4. Driscoll, K., & Walker, S. Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data. International Journal of Communication, 8, 20 2014), 1745--1764.Google ScholarGoogle Scholar
  5. Driscoll, K., Ananny, M., Guth, K., Kazemzadeh, A., Leavitt, A. and Thorson, K. Big bird, binders, and bayonets: Humor and live-tweeting during the 2012 US presidential debates. Selected Papers of Internet Research, 32013).Google ScholarGoogle Scholar
  6. Chawla, N. V. and Davis, D. A. Bringing big data to personalized healthcare: a patient-centered framework. Journal of general internal medicine, 28 Suppl 3(Sep 2013), S660--665.Google ScholarGoogle Scholar
  7. Emery, S. L., Szczypka, G., Abril, E. P., Kim, Y. and Vera, L. Are you Scared Yet?: Evaluating Fear Appeal Messages in Tweets about the Tips Campaign. Journal of Communication, 64(Apr 2014), 278--295.Google ScholarGoogle Scholar
  8. Agarwal, S. D., Bennett, W. L., Johnson, C. N., & Walker, S. A model of crowd enabled organization: Theory and methods for understanding the role of twitter in the occupy protests. International Journal of Communication, 8, 27 2014), 646--672.Google ScholarGoogle Scholar
  9. Leetaru, K. Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space. First Monday, 16, 9 2011).Google ScholarGoogle Scholar
  10. Weber, M. S. Newspapers and the Long-Term Implications of Hyperlinking. Journal of Computer-Mediated Communication, 17, 2 2012), 187--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bisel, R. S., Barge, J. K., Dougherty, D. S., Lucas, K. and Tracy, S. J. A Round-Table Discussion of "Big" Data in Qualitative Organizational Communication Research. Management Communication Quarterly, 28, 4 2014), 625--649.Google ScholarGoogle Scholar
  12. Jungherr, A. The Logic of Political Coverage on Twitter: Temporal Dynamics and Content. Journal of Communication, 64, 2 2014), 239--259.Google ScholarGoogle ScholarCross RefCross Ref
  13. Park, J., Baek, Y. M. and Cha, M. Cross-Cultural Comparison of Nonverbal Cues in Emoticons on Twitter: Evidence from Big Data Analysis. Journal of Communication, 64, 2 2014), 333--354.Google ScholarGoogle ScholarCross RefCross Ref
  14. Vargo, C. J., Guo, L., McCombs, M. and Shaw, D. L. Network Issue Agendas on Twitter During the 2012 U.S. Presidential Election. Journal of Communication, 64, 2 2014), 296--316.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ifukor, P. "Elections" or "selections"? Blogging and twittering the Nigerian 2007 general elections. Bulletin of Science, Technology & Society, 30, 6 2010), 398--414.Google ScholarGoogle Scholar
  16. Bennett, W. L. and Segerberg, A. Digital media and the personalization of collective action: Social technology and the organization of protests against the global economic crisis. Information, Communication & Society, 14, 6 2011), 770--799.Google ScholarGoogle ScholarCross RefCross Ref
  17. Bruns, A., Highfield, T., & Burgess, J. The Arab Spring and Social Media Audiences: English and Arabic Twitter Users and Their Networks. American Behavioral Scientist, 57, 7 2013), 871--898.Google ScholarGoogle Scholar
  18. Dubrofsky, R. E. Surveillance on Reality Television and Facebook: From Authenticity to Flowing Data. Communication Theory, 21, 2 2011), 111--129.Google ScholarGoogle ScholarCross RefCross Ref
  19. Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E. and Fowler, J. H. A 61-million-person experiment in social influence and political mobilization. Nature, 489, 7415 2012), 295--298.Google ScholarGoogle ScholarCross RefCross Ref
  20. Leskovec, J., Kleinberg, J. and Faloutsos, C. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data, 1, 1 2007), 1--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Weber, M. S. and Monge, P. Industries in turmoil: Driving transformation during periods of disruption. Communication Research2014), 1--30.Google ScholarGoogle Scholar
  22. Bennett, W. L. Social movements beyond borders: understanding two eras of transnational activism. Transnational protest and global activism2005), 203--226.Google ScholarGoogle Scholar
  23. Crawford, K., Gray, M. L., Miltner, K. Critiquing Big Data: Politics, Ethics, Epistemology. International Journal of Communication, 82014), 1663--1672.Google ScholarGoogle Scholar
  24. Fairfield, J. and Shtein, H. Big Data, Big Problems: Emerging Issues in the Ethics of Data Science and Journalism. Journal of Mass Media Ethics, 29, 1 2014), 38--51.Google ScholarGoogle Scholar
  25. Trevisan, F. and Reilly, P. Ethical dilemmas in researching sensitive issues online: lessons from the study of British disability dissent networks. Information, Communication & Society, 17, 9 2014), 1131--1146.Google ScholarGoogle Scholar
  26. Fallik, D. For big data, big questions remain. Health affairs, 33, 7 (Jul 2014), 1111--1114.Google ScholarGoogle Scholar
  27. Kaisler, S., Armour, F., Espinosa, J. A. and Money, W. Big Data: Issues and Challenges Moving Forward. City, 2013.Google ScholarGoogle Scholar
  28. Manovich, L. Trending: the promises and the challenges of big social data. University of Minnesota Press, City, 2011.Google ScholarGoogle Scholar
  29. Bizer, C., Boncz, P., Brodie, M. L., & Erling, O. The meaningful use of big data: four perspectives--four challenges. ACM SIGMOD Record, 40, 4 2012), 56--60 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Busch, L. A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large-Scale Data Sets. International Journal of Communication, 82014), 1727--1744.Google ScholarGoogle Scholar
  31. AlNoamany, Y., AlSum, A., Weigle, M. C. and Nelson, M. L. Who and what links to the Internet Archive. International Journal on Digital Libraries, 14, 3--4 2014), 101--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Agata, T., Miyata, Y., Ishita, E., Ikeuchi, A. and Ueda, S. Life span of web pages: A survey of 10 million pages collected in 2001. City, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ainsworth, S. G., Alsum, A., SalahEldeen, H., Weigle, M. C. and Nelson, M. L. How much of the web is archived? In Proceedings of the Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries (Ottawa, Ontario, Canada, 2011). ACM, {insert City of Publication},{insert 2011 of Publication}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. McKay, C. Ephemeral to enduring: the Internet Archive and its role in preserving digital media. Information Technology and Libraries, 23, 1 2004), 3.Google ScholarGoogle Scholar
  35. SalahEldeen, H. M. and Nelson, M. L. Losing my revolution: how many resources shared on social media have been lost? Springer, City, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Spaniol, M., Denev, D., Mazeika, A., Weikum, G. and Senellart, P. Data quality in web archiving. In Proceedings of the Proceedings of the 3rd workshop on Information credibility on the web (Madrid, Spain, 2009). ACM, {insert City of Publication},{insert 2009 of Publication}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Song, R., Liu, H., Wen, J.-R. and Ma, W.-Y. Learning block importance models for web pages. ACM, City, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Weber, M. S. Observing the Web by Understanding the Past: Archival Internet Research. WWW'14 Companion Proceedings2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Generous, N., Fairchild, G., Deshpande, A., Del Valle, S. Y. and Priedhorsky, R. Global disease monitoring and forecasting with wikipedia. PLoS computational biology, 10, 11 2014), e1003892.Google ScholarGoogle Scholar

Index Terms

  1. Big Data?: Big Issues Degradation in Longitudinal Data and Implications for Social Sciences

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WebSci '15: Proceedings of the ACM Web Science Conference
        June 2015
        366 pages
        ISBN:9781450336727
        DOI:10.1145/2786451

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate218of875submissions,25%

        Upcoming Conference

        Websci '24
        16th ACM Web Science Conference
        May 21 - 24, 2024
        Stuttgart , Germany
      • Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader