skip to main content
10.1145/3097983.3098171acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks

Published: 13 August 2017 Publication History

Abstract

We present a framework for specifying, training, evaluating, and deploying machine learning models. Our focus is on simplifying cutting edge machine learning for practitioners in order to bring such technologies into production. Recognizing the fast evolution of the field of deep learning, we make no attempt to capture the design space of all possible model architectures in a domain-specific language (DSL) or similar configuration language. We allow users to write code to define their models, but provide abstractions that guide developers to write models in ways conducive to productionization. We also provide a unifying Estimator interface, making it possible to write downstream infrastructure (e.g. distributed training, hyperparameter tuning) independent of the model implementation.
We balance the competing demands for flexibility and simplicity by offering APIs at different levels of abstraction, making common model architectures available out of the box, while providing a library of utilities designed to speed up experimentation with model architectures. To make out of the box models flexible and usable across a wide range of problems, these canned Estimators are parameterized not only over traditional hyperparameters, but also using feature columns, a declarative specification describing how to interpret input data
We discuss our experience in using this framework in research and production environments, and show the impact on code health, maintainability, and development speed.

References

[1]
Running your models in production with TensorFlow Serving. https: //research.googleblog.com/2016/02/running-your-models-in-production-with.html, accessed 2017-02-08.
[2]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283.
[3]
Amit Agarwal, Eldar Akchurin, Chris Basoglu, Guoguo Chen, Scott Cyphers, Jasha Droppo, Adam Eversole, Brian Guenter, Mark Hillebrand, Ryan Hoens, Xuedong Huang, Zhiheng Huang, Vladimir Ivanov, Alexey Kamenev, Philipp Kranen, Oleksii Kuchaiev, Wolfgang Manousek, Avner May, Bhaskar Mitra, Olivier Nano, Gaizka Navarro, Alexey Orlov, Marko Padmilac, Hari Parthasarathi, Baolin Peng, Alexey Reznichenko, Frank Seide, Michael L. Seltzer, Malcolm Slaney, Andreas Stolcke, Yongqiang Wang, Huaming Wang, Kaisheng Yao, Dong Yu, Yu Zhang, and Geoffrey Zweig. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Technical Report MSR-TR-2014-112. http://research.microsoft.com/apps/pubs/default.aspx?id=226641
[4]
Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brebisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Ĉoté, Myriam Ĉoé, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balazs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, Françcois Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Ëtienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, and Ying Zhang. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688 (May 2016). http://arxiv.org/abs/1605.02688
[5]
Amazon. 2016. Dsstne. https://github.com/amznlabs/amazon-dsstne. (2016).
[6]
Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, and Martin Zinkevich. 2017. The Anatomy of a Production-Scale Continuously-Training ML Platform. KDD. (2017).
[7]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. CoRR abs/1603.02754 (2016). http://arxiv.org/abs/1603.02754
[8]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs/1512.01274(2015). http://arxiv.org/abs/1512.01274
[9]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. In DLRS. 7--10.
[10]
François Chollet. 2015. keras. https://github.com/fchollet/keras. (2015).
[11]
Ronan Collobert, Samy Bengio, and Johnny Marithoz. 2002. Torch: A Modular Machine Learning Software Library. (2002).
[12]
The Scipy community. 2012. NumPy Reference Guide. SciPy.org. http://docs.scipy.org/doc/numpy/reference/
[13]
Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc-Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). Curran Associates Inc., USA, 1223--1231. http://dl.acm.org/citation.cfm?id=2999134.2999271
[14]
Deeplearning4j Development Team. 2016. Deeplearning4j: Open-source distributed deep learning for the JVM, Apache Software Foundation License 2.0. http://deeplearning4j.org. (2016).
[15]
Sander Dieleman, Jan Schl ü, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, Martin Thoma, Eric Battenberg, Jack Kelly, Jeffrey De Fauw, Michael Heilman, diogo149, Brian McFee, Hendrik Weideman, takacsg84, peterderivaz, Jon, instagibbs, Dr. Kashif Rasul, CongLiu, Brite-fury, and Jonas Degrave. 2015. Lasagne: First release. (Aug.2015).
[16]
Sergio Guadarrama and Nathan Silberman. 2016. TF Slim. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim. (2016).
[17]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM '14). ACM, New York, NY, USA, 675--678.
[18]
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 1235--1241. http://dl.acm.org/citation.cfm?id=2946645.2946679
[19]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12 (Nov. 2011), 2825--2830. http://dl.acm.org/citation.cfm?id=1953048.2078195
[20]
Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a Next-Generation Open Source Framework for Deep Learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS). http://learningsys.org/papers/LearningSys2015paper33.pdf
[21]
Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, and Yoshua Bengio. 2015. Blocks and Fuel: Frameworks for deep learning. CoRR abs/1506.00619 (2015). http://arxiv.org/abs/1506.00619

Cited By

View all
  • (2023)Cougar: A General Framework for Jobs Optimization In Cloud2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00262(3417-3429)Online publication date: Apr-2023
  • (2023)Scavenger: A Cloud Service For Optimizing Cost and Performance of ML Training2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00045(403-413)Online publication date: May-2023
  • (2022)PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00324(3453-3466)Online publication date: May-2022
  • Show More Cited By

Index Terms

  1. TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2017
    2240 pages
    ISBN:9781450348874
    DOI:10.1145/3097983
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2017

    Check for updates

    Author Tags

    1. deep learning
    2. high level api
    3. machine learning framework

    Qualifiers

    • Research-article

    Conference

    KDD '17
    Sponsor:

    Acceptance Rates

    KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)154
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Cougar: A General Framework for Jobs Optimization In Cloud2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00262(3417-3429)Online publication date: Apr-2023
    • (2023)Scavenger: A Cloud Service For Optimizing Cost and Performance of ML Training2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00045(403-413)Online publication date: May-2023
    • (2022)PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00324(3453-3466)Online publication date: May-2022
    • (2022)Survey: Tensorflow in Machine LearningJournal of Physics: Conference Series10.1088/1742-6596/2273/1/0120082273:1(012008)Online publication date: 1-May-2022
    • (2022)Fast training of a transformer for global multi-horizon time series forecasting on tensor processing unitsThe Journal of Supercomputing10.1007/s11227-022-05009-x79:8(8475-8498)Online publication date: 19-Dec-2022
    • (2021)Datenbasierte Algorithmen zur Unterstützung von Entscheidungen mittels künstlicher neuronaler NetzeData Science10.1007/978-3-658-33403-1_13(209-224)Online publication date: 12-Nov-2021
    • (2021)Multi-modem Implementation Method Based on Deep Autoencoder NetworkWireless and Satellite Systems10.1007/978-3-030-69072-4_40(487-501)Online publication date: 28-Feb-2021
    • (2021)Adaptive modem and interference suppression based on deep learningTransactions on Emerging Telecommunications Technologies10.1002/ett.4220Online publication date: 4-Feb-2021
    • (2020)Construction of Deep Convolutional Neural Networks For Medical Image ClassificationInternational Journal of Computer Vision and Image Processing10.4018/IJCVIP.20190401019:2(1-15)Online publication date: 1-Oct-2020
    • (2020)aeSpTV: An Adaptive and Efficient Framework for Sparse Tensor-Vector Product Kernel on a High-Performance Computing PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.299042931:10(2329-2345)Online publication date: 1-Oct-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media