ABSTRACT
In a rank-polymorphic programming language, all functions automatically lift to operate on arbitrarily high-dimensional aggregate data. By adding records to such a language, we can support computation on data frames, a tabular data structure containing heterogeneous data but in which individual columns are homogeneous. In such a setting, a data frame is a vector of records, subject to both ordinary array operations (, filtering, reducing, sorting) and lifted record operations—projecting a field lifts to projecting a column. Data frames have become a popular tool for exploratory data analysis, but fluidity of interacting with data frames via lifted record operations depends on how the language’s records are designed.
We investigate three languages with different notions of record data: Racket, Standard ML, and Python. For each, we examine several common tasks for working with data frames and how the language’s records make these tasks easy or hard. Based on their advantages and disadvantages, we synthesize their ideas to produce a design for record types which is flexible for both scalar and lifted computation.
Supplemental Material
- J. Nathan Foster, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-update Problem. ACM Trans. Program. Lang. Syst. 29, 3, Article 17 (May 2007). Google ScholarDigital Library
- Jeremy Gibbons. 2016. APLicative Programming with Naperian Functors (Extended Abstract). In Proceedings of the 1st International Workshop on Type-Driven Development (TyDe 2016) . ACM, New York, NY, USA, 13–14. Google ScholarDigital Library
- Troels Henriksen. 2017. Design and Implementation of the Futhark Programming Language . Ph.D. Dissertation. University of Copenhagen, Universitetsparken 5, 2100 København.Google Scholar
- Troels Henriksen. 2017. Dot Notation for Records. https:// futhark-lang.org/blog/2017-11-11-dot-notation-for-records.htmlGoogle Scholar
- Kenneth E. Iverson. 1962. A programming language. John Wiley & Sons, Inc., New York, NY, USA. Google ScholarDigital Library
- Wes McKinney et al. 2010. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Vol. 445. Austin, TX, 51–56.Google Scholar
- Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. 1997. The definition of standard ML: revised . MIT press. Google ScholarDigital Library
- R Core Team. 2013. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/Google Scholar
- Justin Slepak, Olin Shivers, and Panagiotis Manolios. 2014. An arrayoriented language with static rank polymorphism. In European Symposium on Programming Languages and Systems . Springer, 27–46.Google ScholarDigital Library
- Satish Thatte. 1991. A type system for implicit scaling. Sci. Comput. Program. 17, 1-3 (Dec. 1991), 217–245. Google ScholarDigital Library
- Mitchell Wand. 1991. Type inference for record concatenation and multiple inheritance. Information and Computation 93, 1 (1991), 1–15. Google ScholarDigital Library
Index Terms
- Records with rank polymorphism
Recommendations
Rank polymorphism viewed as a constraint problem
ARRAY 2018: Proceedings of the 5th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array ProgrammingRank polymorphism serves as a type of control flow used in array-oriented languages, where functions are automatically lifted to operate on high-dimensional arguments. The iteration space is derived directly from the shape of the data, presenting a ...
Records and Record Types in Semantic Theory
This paper explores possibilities for formulating linguistic semantics in terms of records and record types of the kind used in recent developments of Martin-Löf type theory. We will suggest that this gives us tools to develop a single theory which ...
Introducing records by refactoring
ERLANG '07: Proceedings of the 2007 SIGPLAN workshop on ERLANG WorkshopThis paper focuses on introducing a new transformation to our existing model for refactoring Erlang programs. The goal of the transformation is to introduce a new abstraction level in data representation by substituting a group ofrelated data with a ...
Comments