ABSTRACT
This article analyzes the issue of degradation of data accuracy in large-scale longitudinal data sets. Recent research points to a number of issues with large-scale data, including problems of reliability, accuracy and quality over time. Simultaneously, large-scale data is increasingly being utilized in the social sciences. As scholars work to produce theoretically grounded research utilized "small-scale" methods, it is important for researchers to better understand the critical issues associated with the analysis of large-scale data. In order to illustrate the issues associated with this type of research, a case study analysis of archival Internet data is presented focusing on the issues of degradation of data accuracy over time. Suggestions for future studies are given.
- Tien, J. M. Big data: Unleashing information. Journal of Systems Science and Systems Engineering, 22, 2 2013), 127--151.Google ScholarCross Ref
- Armstrong, K. Big data: a revolution that will transform how we live, work, and think. Information, Communication & Society, 17, 10 2014), 1300--1302.Google Scholar
- Lazer, D., Kennedy, R., King, G. and Vespignani, A. Big data. The parable of Google Flu: traps in big data analysis. Science, 343, 6176 (Mar 14 2014), 1203--1205.Google Scholar
- Driscoll, K., & Walker, S. Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data. International Journal of Communication, 8, 20 2014), 1745--1764.Google Scholar
- Driscoll, K., Ananny, M., Guth, K., Kazemzadeh, A., Leavitt, A. and Thorson, K. Big bird, binders, and bayonets: Humor and live-tweeting during the 2012 US presidential debates. Selected Papers of Internet Research, 32013).Google Scholar
- Chawla, N. V. and Davis, D. A. Bringing big data to personalized healthcare: a patient-centered framework. Journal of general internal medicine, 28 Suppl 3(Sep 2013), S660--665.Google Scholar
- Emery, S. L., Szczypka, G., Abril, E. P., Kim, Y. and Vera, L. Are you Scared Yet?: Evaluating Fear Appeal Messages in Tweets about the Tips Campaign. Journal of Communication, 64(Apr 2014), 278--295.Google Scholar
- Agarwal, S. D., Bennett, W. L., Johnson, C. N., & Walker, S. A model of crowd enabled organization: Theory and methods for understanding the role of twitter in the occupy protests. International Journal of Communication, 8, 27 2014), 646--672.Google Scholar
- Leetaru, K. Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space. First Monday, 16, 9 2011).Google Scholar
- Weber, M. S. Newspapers and the Long-Term Implications of Hyperlinking. Journal of Computer-Mediated Communication, 17, 2 2012), 187--201. Google ScholarDigital Library
- Bisel, R. S., Barge, J. K., Dougherty, D. S., Lucas, K. and Tracy, S. J. A Round-Table Discussion of "Big" Data in Qualitative Organizational Communication Research. Management Communication Quarterly, 28, 4 2014), 625--649.Google Scholar
- Jungherr, A. The Logic of Political Coverage on Twitter: Temporal Dynamics and Content. Journal of Communication, 64, 2 2014), 239--259.Google ScholarCross Ref
- Park, J., Baek, Y. M. and Cha, M. Cross-Cultural Comparison of Nonverbal Cues in Emoticons on Twitter: Evidence from Big Data Analysis. Journal of Communication, 64, 2 2014), 333--354.Google ScholarCross Ref
- Vargo, C. J., Guo, L., McCombs, M. and Shaw, D. L. Network Issue Agendas on Twitter During the 2012 U.S. Presidential Election. Journal of Communication, 64, 2 2014), 296--316.Google ScholarCross Ref
- Ifukor, P. "Elections" or "selections"? Blogging and twittering the Nigerian 2007 general elections. Bulletin of Science, Technology & Society, 30, 6 2010), 398--414.Google Scholar
- Bennett, W. L. and Segerberg, A. Digital media and the personalization of collective action: Social technology and the organization of protests against the global economic crisis. Information, Communication & Society, 14, 6 2011), 770--799.Google ScholarCross Ref
- Bruns, A., Highfield, T., & Burgess, J. The Arab Spring and Social Media Audiences: English and Arabic Twitter Users and Their Networks. American Behavioral Scientist, 57, 7 2013), 871--898.Google Scholar
- Dubrofsky, R. E. Surveillance on Reality Television and Facebook: From Authenticity to Flowing Data. Communication Theory, 21, 2 2011), 111--129.Google ScholarCross Ref
- Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E. and Fowler, J. H. A 61-million-person experiment in social influence and political mobilization. Nature, 489, 7415 2012), 295--298.Google ScholarCross Ref
- Leskovec, J., Kleinberg, J. and Faloutsos, C. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data, 1, 1 2007), 1--42. Google ScholarDigital Library
- Weber, M. S. and Monge, P. Industries in turmoil: Driving transformation during periods of disruption. Communication Research2014), 1--30.Google Scholar
- Bennett, W. L. Social movements beyond borders: understanding two eras of transnational activism. Transnational protest and global activism2005), 203--226.Google Scholar
- Crawford, K., Gray, M. L., Miltner, K. Critiquing Big Data: Politics, Ethics, Epistemology. International Journal of Communication, 82014), 1663--1672.Google Scholar
- Fairfield, J. and Shtein, H. Big Data, Big Problems: Emerging Issues in the Ethics of Data Science and Journalism. Journal of Mass Media Ethics, 29, 1 2014), 38--51.Google Scholar
- Trevisan, F. and Reilly, P. Ethical dilemmas in researching sensitive issues online: lessons from the study of British disability dissent networks. Information, Communication & Society, 17, 9 2014), 1131--1146.Google Scholar
- Fallik, D. For big data, big questions remain. Health affairs, 33, 7 (Jul 2014), 1111--1114.Google Scholar
- Kaisler, S., Armour, F., Espinosa, J. A. and Money, W. Big Data: Issues and Challenges Moving Forward. City, 2013.Google Scholar
- Manovich, L. Trending: the promises and the challenges of big social data. University of Minnesota Press, City, 2011.Google Scholar
- Bizer, C., Boncz, P., Brodie, M. L., & Erling, O. The meaningful use of big data: four perspectives--four challenges. ACM SIGMOD Record, 40, 4 2012), 56--60 Google ScholarDigital Library
- Busch, L. A Dozen Ways to Get Lost in Translation: Inherent Challenges in Large-Scale Data Sets. International Journal of Communication, 82014), 1727--1744.Google Scholar
- AlNoamany, Y., AlSum, A., Weigle, M. C. and Nelson, M. L. Who and what links to the Internet Archive. International Journal on Digital Libraries, 14, 3--4 2014), 101--115. Google ScholarDigital Library
- Agata, T., Miyata, Y., Ishita, E., Ikeuchi, A. and Ueda, S. Life span of web pages: A survey of 10 million pages collected in 2001. City, 2014.Google ScholarDigital Library
- Ainsworth, S. G., Alsum, A., SalahEldeen, H., Weigle, M. C. and Nelson, M. L. How much of the web is archived? In Proceedings of the Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries (Ottawa, Ontario, Canada, 2011). ACM, {insert City of Publication},{insert 2011 of Publication}. Google ScholarDigital Library
- McKay, C. Ephemeral to enduring: the Internet Archive and its role in preserving digital media. Information Technology and Libraries, 23, 1 2004), 3.Google Scholar
- SalahEldeen, H. M. and Nelson, M. L. Losing my revolution: how many resources shared on social media have been lost? Springer, City, 2012.Google ScholarDigital Library
- Spaniol, M., Denev, D., Mazeika, A., Weikum, G. and Senellart, P. Data quality in web archiving. In Proceedings of the Proceedings of the 3rd workshop on Information credibility on the web (Madrid, Spain, 2009). ACM, {insert City of Publication},{insert 2009 of Publication}. Google ScholarDigital Library
- Song, R., Liu, H., Wen, J.-R. and Ma, W.-Y. Learning block importance models for web pages. ACM, City, 2004.Google ScholarDigital Library
- Weber, M. S. Observing the Web by Understanding the Past: Archival Internet Research. WWW'14 Companion Proceedings2014). Google ScholarDigital Library
- Generous, N., Fairchild, G., Deshpande, A., Del Valle, S. Y. and Priedhorsky, R. Global disease monitoring and forecasting with wikipedia. PLoS computational biology, 10, 11 2014), e1003892.Google Scholar
Index Terms
Big Data?: Big Issues Degradation in Longitudinal Data and Implications for Social Sciences
Comments