skip to main content
10.1145/3097983.3098021acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open Access

TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

Authors Info & Claims
Published:13 August 2017Publication History

ABSTRACT

Creating and maintaining a platform for reliably producing and deploying machine learning models requires careful orchestration of many components---a learner for generating models based on training data, modules for analyzing and validating both data as well as models, and finally infrastructure for serving models in production. This becomes particularly challenging when data changes over time and fresh models need to be produced continuously. Unfortunately, such orchestration is often done ad hoc using glue code and custom scripts developed by individual teams for specific use cases, leading to duplicated effort and fragile systems with high technical debt.

We present TensorFlow Extended (TFX), a TensorFlow-based general-purpose machine learning platform implemented at Google. By integrating the aforementioned components into one platform, we were able to standardize the components, simplify the platform configuration, and reduce the time to production from the order of months to weeks, while providing platform stability that minimizes disruptions.

We present the case study of one deployment of TFX in the Google Play app store, where the machine learning models are refreshed continuously as new data arrive. Deploying TFX led to reduced custom code, faster experiment cycles, and a 2% increase in app installs resulting from improved data and model analysis.

Skip Supplemental Material Section

Supplemental Material

cheng_machine_learning.mp4

mp4

376.6 MB

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning OSDI. 265--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rami Abousleiman, Guangzhi Qu, and Osamah A. Rawashdeh. 2013. North Atlantic Right Whale Contact Call Detection. CoRR Vol. abs/1304.7851 (2013).Google ScholarGoogle Scholar
  3. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In DLRS. 7--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. RecSys. 191--198.Google ScholarGoogle Scholar
  5. Yann Dauphin, Razvan Pascanu, Caglar Gülccehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. CoRR Vol. abs/1406.2572 (2014).Google ScholarGoogle Scholar
  6. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and et al. 2007. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm AOFA.Google ScholarGoogle Scholar
  7. Tim Kraska, Ameet Talwalkar, John C. Duchi, Rean Griffith, Michael J. Franklin, and Michael I. Jordan 2013. MLbase: A Distributed Machine-learning System. CIDR.Google ScholarGoogle Scholar
  8. Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, and Ken Goldberg 2016. ActiveClean: Interactive Data Cleaning For Statistical Modeling. PVLDB, Vol. 9, 12 (2016), 948--959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, and Tawfiq Hasanin 2015. A survey of open source tools for machine learning with big data in the Hadoop ecosystem. Journal of Big Data, Vol. 2, 1 (2015), 24. Google ScholarGoogle ScholarCross RefCross Ref
  10. Cheng Li, Yue Lu, Qiaozhu Mei, Dong Wang, and Sandeep Pandey 2015. Click-through Prediction for Advertising in Twitter Timeline KDD. 1959--1968.Google ScholarGoogle Scholar
  11. Jimmy J. Lin and Alek Kolcz 2012. Large-scale machine learning at twitter. In SIGMOD. 793--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, and Jeremy Kubica 2013. Ad Click Prediction: A View from the Trenches. In KDD. 1222--1230.Google ScholarGoogle Scholar
  13. Xiangrui Meng, Joseph K. Bradley, Burak Yavuz, Evan R. Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, D. B. Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2015. MLlib: Machine Learning in Apache Spark. CoRR Vol. abs/1505.06807 (2015).Google ScholarGoogle Scholar
  14. J.I. Munro and M.S. Paterson 1980. Selection and sorting with limited storage. Theoretical Computer Science Vol. 12, 3 (1980), 315--323. Google ScholarGoogle ScholarCross RefCross Ref
  15. Sinno Jialin Pan and Qiang Yang 2010. A Survey on Transfer Learning. IEEE Trans. on Knowl. and Data Eng. Vol. 22, 10 (Oct. 2010), 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Franccois Crespo, and Dan Dennison 2015. Hidden Technical Debt in Machine Learning Systems. NIPS. 2503--2511.Google ScholarGoogle Scholar
  17. Evan R. Sparks, Shivaram Venkataraman, Tomer Kaftan, Michael J. Franklin, and Benjamin Recht. 2016. KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics. CoRR Vol. abs/1610.09451 (2016).Google ScholarGoogle Scholar
  18. Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel Madden, and Matei Zaharia. 2016. ModelDB: a system for machine learning model management HILDA@SIGMOD. 14.Google ScholarGoogle Scholar
  19. Cassandra Xia, Clemens Mewald, D. Sculley, David Soergel, George Roumpos, Heng-Tze Cheng, Illia Polosukhin, Jamie Alexander Smith, Jianwei Xie, Lichan Hong, Martin Wicke, Mustafa Ispir, Philip Daniel Tucker, Yuan Tang, and Zakaria Haque 2017. Train and Distribute: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks. KDD (under review).Google ScholarGoogle Scholar
  20. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? NIPS. 3320--3328.Google ScholarGoogle Scholar
  21. Martin Zinkevich. 2016. Rules of Machine Learning. In NIPS Workshop on Reliable Machine Learning. Invited Talk.Google ScholarGoogle Scholar

Index Terms

  1. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2017
          2240 pages
          ISBN:9781450348874
          DOI:10.1145/3097983

          Copyright © 2017 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 August 2017

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader