skip to main content
research-article
Free Access

Data science and prediction

Published:01 December 2013Publication History
Skip Abstract Section

Abstract

Big data promises automated actionable knowledge creation and predictive models for use by both humans and computers.

References

  1. Anderson, C. The end of theory: The data deluge makes the scientific method obsolete. Wired 16, 7 (June 23, 2008).Google ScholarGoogle Scholar
  2. Aral, S. and Walker, D. Identifying influential and susceptible members of social networks. Science 337, 6092 (June 21, 2012).Google ScholarGoogle ScholarCross RefCross Ref
  3. Buchan, I., Winn, J., and Bishop, C. A Unified Modeling Approach to Data-Intensive Healthcare. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA, 2009.Google ScholarGoogle Scholar
  4. Dhar, V. Prediction in financial markets: The case for small disjuncts. ACM Transactions on Intelligent Systems and Technologies 2, 3 (Apr. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dhar, V. and Chou, D. A comparison of nonlinear models for financial prediction. IEEE Transactions on Neural Networks 12, 4 (June 2001), 907--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dhar, V. and Stein, R. Seven Methods for Transforming Corporate Data Into Business Intelligence. Prentice-Hall, Englewood Cliffs, NJ, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Frawley, W. and Piatetsky-Shapiro, G., Eds. Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge, MA, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gladwell, M. The Tipping Point: How Little Things Can Make a Big Difference. Little Brown, New York, 2000.Google ScholarGoogle Scholar
  9. Goel, S., Watts, D., and Goldstein, D. The structure of online diffusion networks. In Proceedings of the 13th ACM Conference on Electronic Commerce (2012), 623--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hastie, T., Tibsharani, R., and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2009.Google ScholarGoogle Scholar
  11. Heilbron, J.L., Ed. The Oxford Companion to the History of Modern Science. Oxford University Press, New York, 2003.Google ScholarGoogle Scholar
  12. Hey, T., Tansley, S., and Tolle, K., Eds. 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA, 2009.Google ScholarGoogle Scholar
  13. Hunt, J., Baldochi, D., and van Ingen, C. Redefining Ecological Science Using Data. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, WA, 2009.Google ScholarGoogle Scholar
  14. Issenberg, S. A more perfect union: How President Obama's campaign used big data to rally individual voters. MIT Technology Review (Dec. 2012).Google ScholarGoogle Scholar
  15. Kohavi, R., Longbotham, R., Sommerfield, D., and Henne, R. Controlled experiments on the Web: Survey and practical guide. Data Mining and Knowledge Discovery 18 (2009), 140--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lin, T., Patrick, P., Gamon, M., Kannan, A., and Fuxman, A. Active objects: Actions for entity-centric search. In Proceedings of the 21st International Conference on the World Wide Web (Lyon, France). ACM Press, New York, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Linoff, G. and Berry, M. Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, Inc., New York, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Maguire, J. and Dhar, V. Comparative effectiveness for oral anti-diabetic treatments among newly diagnosed Type 2 diabetics: Data-driven predictive analytics in healthcare. Health Systems 2 (2013), 73--92.Google ScholarGoogle ScholarCross RefCross Ref
  19. McKinsey Global Institute. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Technical Report, June 2011.Google ScholarGoogle Scholar
  20. Meinshausen, N. Relaxed lasso. Computational Statistics & Data Analysis 52, 1 (Sept. 15, 2007), 374--393.Google ScholarGoogle ScholarCross RefCross Ref
  21. Papert, S. An exploration in the space of mathematics educations. International Journal of Computers for Mathematical Learning 1, 1 (1996), 95--123.Google ScholarGoogle ScholarCross RefCross Ref
  22. Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, U.K., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Perlich, C., Provost, F., and Simonoff, J. Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research 4, 12 (2003), 211--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Popper, K. Conjectures and Refutations. Routledge, London, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  25. Provost, F. and Fawcett, T. Data Science for Business. O'Reilly Media, New York, 2013.Google ScholarGoogle Scholar
  26. Roush, W. Google gets a second brain, changing everything about search. Xconomy (Dec. 12, 2012); http://www.xconomy.com/san-francisco/2012/12/12/google-gets-a-second-brain-changing-everything-about-search/?single_page=trueGoogle ScholarGoogle Scholar
  27. Shmueli, G. To explain or to predict? Statistical Science 25, 3 (Aug. 2010), 289--310.Google ScholarGoogle ScholarCross RefCross Ref
  28. Simon, H.A. and Hayes, J.R. The understanding process: Problem isomorphs. Cognitive Psychology 8, 2 (Apr. 1976), 165--190.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sloman, S. Causal Models. Oxford University Press, Oxford, U.K. 2005.Google ScholarGoogle Scholar
  30. Spirtes, P., Scheines, R., and Glymour, C. Causation, Prediction and Search. Springer, New York, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  31. Tukey, J.W. Exploratory Data Analysis. Addison-Wesley, Boston, 1977.Google ScholarGoogle Scholar
  32. Wing, J. Computational thinking. Commun. ACM 49, 3 (Mar. 2006), 33--35. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data science and prediction

                    Recommendations

                    Reviews

                    Charles Kenneth Davis

                    This is an enlightening treatise on data science. There is no hype here-just a thought-provoking piece that articulates fundamental concepts and implications. The natural audience is the IT or business professional (or manager) who is interested in acquiring a clearer understanding of modern data science. Focusing primarily on examples from the healthcare industry, this article explains succinctly why “big data” really is different because of its impact on well-established approaches to creating knowledge. The author begins by defining “data science” as the “generalizable extraction of knowledge from data,” focusing on the notions that much of today's data is unstructured and that traditional database models are mostly unsuitable for such data. After this introduction, he begins to develop the core thesis of the article with a discussion of prediction and machine learning. The conventional approach to creating knowledge is to build a theory in the human mind based upon previously established theories and then to verify the new theory by collecting and analyzing appropriate data. The author points out that big data turns this on its head by making it possible for machine learning algorithms to build good models for predicting outcomes with little understanding of key underlying relationships and with no theoretical framework to support those models. Furthermore, since these models are based on the data and are essentially computer-generated, they can be made to evolve in conjunction with the processes that create their data. There is no need to rebuild theory as the situation changes in order to build new models. All of this, of course, portends fully automated decision making on a grand scale. This is an important article for those who wish to understand the rationale and potential for data science. The focus is not so much on analytics, per se, as it is on machine-based prediction and machine-based decision making. This informative article lays the conceptual groundwork for these insights, and explains how and why machine learning is the true driving force behind the future of the data science phenomenon. Online Computing Reviews Service

                    Ahmed S Nagy

                    Dhar presents a theory of data science that addresses challenges and caveats for dealing with big data. The study is well documented and easy to read for a wide audience. It is a useful guide to understand timely challenges in the area of big data. The review fits well with recent developments in knowledge modeling and the semantic web. The article presents a new perspective on big data and demonstrates this with real-world situations. Dhar argues that we are moving toward a big data era in which computers will be better decision makers than people in many situations. Though that is a bold statement, it is true to a great extent. Dhar emphasizes the limitations of knowledge discovery techniques by claiming that all models are wrong, yet some are useful. The article explains the usefulness of machine learning as an approach for discovering interesting data patterns. Dhar argues that big data helps in reducing errors resulting from misspecifications of a model and small samples by enabling validation. He concludes that big data makes it feasible to uncover the causal models generating the data by using machine learning to model big data. Online Computing Reviews Service

                    Access critical reviews of Computing literature here

                    Become a reviewer for Computing Reviews.

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image Communications of the ACM
                      Communications of the ACM  Volume 56, Issue 12
                      December 2013
                      102 pages
                      ISSN:0001-0782
                      EISSN:1557-7317
                      DOI:10.1145/2534706
                      Issue’s Table of Contents

                      Copyright © 2013 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 1 December 2013

                      Permissions

                      Request permissions about this article.

                      Request Permissions

                      Check for updates

                      Qualifiers

                      • research-article
                      • Popular
                      • Refereed

                    PDF Format

                    View or Download as a PDF file.

                    PDFChinese translation

                    eReader

                    View online with eReader.

                    eReader

                    HTML Format

                    View this article in HTML Format .

                    View HTML Format