skip to main content
research-article

Stream Processing Languages in the Big Data Era

Published: 11 December 2018 Publication History

Abstract

This paper is a survey of recent stream processing languages, which are programming languages for writing applications that analyze data streams. Data streams, or continuous data flows, have been around for decades. But with the advent of the big-data era, the size of data streams has increased dramatically. Analyzing big data streams yields immense advantages across all sectors of our society. To analyze streams, one needs to write a stream processing application. This paper showcases several languages designed for this purpose, articulates underlying principles, and outlines open challenges.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In Conference on Innovative Data Systems Research (CIDR), pages 277--289, 2005.
[2]
D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. Journal on Very Large Data Bases (VLDB J.), 12(2):120--139, Aug. 2003.
[3]
T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: Fault-tolerant stream processing at internet scale. In Conference on Very Large Data Bases (VLDB) Industrial Track, pages 734--746, Aug. 2013.
[4]
T. Akidau, R. Bradshaw, C. Chambers, S. Chernyak, R. J. Fernandez-Moctezuma, R. Lax, S. McVeety, D. Mills, F. Perry, E. Schmidt, and S. Whittle. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In Conference on Very Large Data Bases (VLDB), pages 1792--1803, Aug. 2015.
[5]
M. H. Ali, C. Gerea, B. Raman, B. Sezgin, T. Tarnavski, T. Verona, P. Wang, P. Zabback, A. Kirilov, A. Ananthanarayan, M. Lu, A. Raizman, R. Krishnan, R. Schindlauer, T. Grabs, S. Bjeletich, B. Chandramouli, J. Goldstein, S. Bhat, Y. Li, V. Di Nicola, X. Wang, D. Maier, I. Santos, O. Nano, and S. Grell. Microsoft CEP server and online behavioral targeting. In Demo at Conference on Very Large Data Bases (VLDB-Demo), pages 1558--1561, 2009.
[6]
H. C. Andrade, B. Gedik, and D. S. Turaga. Fundamentals of stream processing: application design, systems, and analytics. Cambridge University Press, 2014.
[7]
D. Anicic, S. Rudolph, P. Fodor, and N. Stojanovic. Stream reasoning and complex event processing in ETALIS. Semantic Web, 3(4):397--407, 2012.
[8]
G. Antoniou, P. T. Groth, F. van Harmelen, and R. Hoekstra. A Semantic Web Primer, 3rd Edition. MIT Press, 2012.
[9]
A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: semantic foundations and query execution. Journal on Very Large Data Bases (VLDB J.), 15(2):121--142, 2006.
[10]
A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. S. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts. Linear road: A stream data management benchmark. In Conference on Very Large Data Bases (VLDB), pages 480--491, 2004.
[11]
A. Arasu and J. Widom. A denotational semantics for continuous queries over streams and relations. SIGMOD Record, 33(3):6--11, 2004.
[12]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Symposium on Principles of Database Systems (PODS), pages 1--16, 2002.
[13]
M. Balduini, E. D. Valle, and R. Tommasini. SLD revolution: A cheaper, faster yet more accurate streaming linked data framework. In European Semantic Web Conference (ESWC) Satellite Events, pages 263--279, 2017.
[14]
D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle, and M. Grossniklaus. Incremental reasoning on streams and rich background knowledge. In Extended Semantic Web Conference (ESWC), pages 1--15, 2010.
[15]
D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A. Rettinger, and H. Wermser. Deductive and inductive stream reasoning for semantic social media analytics. IEEE Intelligent Systems, 25(6):32--41, 2010.
[16]
H. R. Bazoobandi, H. Beck, and J. Urbani. Expressive stream reasoning with Laser. In International Semantic Web Conference (ISWC), pages 87--103, 2017.
[17]
H. Beck, M. Dao-Tran, T. Eiter, and M. Fink. LARS: A logic-based framework for analyzing reasoning over streams. In Conference on Artificial Intelligence (AAAI), pages 1431--1438, 2015.
[18]
H. Beck, T. Eiter, and C. Folie. Ticker: A system for incremental ASP-based stream reasoning. Theory and Practice of Logic Programming (TPLP), 17(5--6):744--763, 2017.
[19]
I. Botan, R. Derakhshan, N. Dindar, L. Haas, R. J. Miller, and N. Tatbul. SECRET: A model for analysis of the execution semantics of stream processing systems. In Conference on Very Large Data Bases (VLDB), pages 232--243, 2010.
[20]
T. Bourke and M. Pouzet. Z´elus: A synchronous language with ODEs. In Conference on Hybrid Systems: Computation and Control (HSCC), pages 113--118, 2013.
[21]
D. Bricklin and B. Frankston. VisiCalc computer software program, 1979. Reference Manual, Personal Software Inc.
[22]
J. Calbimonte, J. Mora, and ´ O. Corcho. Query rewriting in RDF stream processing. In European Semantic Web Conference (ESWC), pages 486--502, 2016.
[23]
J.-P. Calbimonte, O. Corcho, and A. J. G. Gray. Enabling ontology-based access to streaming data sources. In International Semantic Web Conference (ISWC), pages 96--111, 2010.
[24]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache Flink: Stream and batch processing in a single engine. IEEE Database Engineering Bulletin, 36(4):28--38, Dec. 2015.
[25]
P. Caspi, D. Pilaud, N. Halbwachs, and J. A. Plaice. LUSTRE: a declarative language for real-time programming. In Symposium on Principles of Programming Languages (POPL), pages 178--188, 1987.
[26]
B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, D. Fisher, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. In Conference on Very Large Data Bases (VLDB), pages 401--412, Aug. 2014.
[27]
B. Chandramouli, J. Goldstein, and S. Duan. Temporal analytics on big data for web advertising. In International Conference on Data Engineering (ICDE), pages 90--101, 2012.
[28]
B. Chandramouli, J. Goldstein, and D. Maier. High-performance dynamic pattern matching over disordered streams. In Conference on Very Large Data Bases (VLDB), pages 220--231, 2010.
[29]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Conference on Innovative Data Systems Research (CIDR), pages 668--668, 2003.
[30]
J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In International Conference on Management of Data (SIGMOD), pages 379--390, 2000.
[31]
J. Chen, F. L´ecu´e, J. Z. Pan, and H. Chen. Learning from ontology streams with semantic concept drift. In International Joint Conference on Artificial Intelligence (IJCAI), pages 957--963, 2017.
[32]
J. Clark and S. DeRose. XML path language (XPath) version 1.0. W3C recommendation, W3C, Nov. 1999. http://www.w3.org/TR/1999/REC-xpath-19991116.
[33]
N. H. Cohen and K. T. Kalleberg. EventScript: An event-processing language based on regular expressions with actions. In Conference on Languages, Compiler, and Tool Support for Embedded Systems (LCTES), pages 111--120, 2008.
[34]
J.-L. Colaco, B. Pagano, and M. Pouzet. Scade 6: A formal language for embedded critical software development. In Symposium on Theoretical Aspect of Software Engineering (TASE), 2017.
[35]
C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: A stream database for network applications. In International Conference on Management of Data (SIGMOD) Industrial Track, pages 647--651, 2003.
[36]
D. de Leng and F. Heintz. Qualitative spatio-temporal stream reasoning with unobservable intertemporal spatial relations using landmarks. In Conference on Artificial Intelligence (AAAI), pages 957--963, 2016.
[37]
E. Della Valle, S. Ceri, D. F. Barbieri, D. Braga, and A. Campi. A first step towards stream reasoning. In Future Internet Symposium (FIS), pages 72--81, 2008.
[38]
E. Della Valle, S. Ceri, F. van Harmelen, and D. Fensel. It's a streaming world! Reasoning upon rapidly changing information. IEEE Intelligent Systems, 24(6):83--89, 2009.
[39]
D. Dell'Aglio, J. Calbimonte, E. Della Valle, and ´ O. Corcho. Towards a unified language for RDF stream query processing. In European Semantic Web Conference (ESWC) Satellite Events, pages 353--363, 2015.
[40]
D. Dell'Aglio, E. Della Valle, J. Calbimonte, and ´ O. Corcho. RSP-QL semantics: A unifying query model to explain heterogeneity of RDF stream processing systems. International Journal on Semantic Web and Information Systems (IJSWIS), 10(4):17--44, 2014.
[41]
D. DellAglio, E. Della Valle, F. van Harmelen, and A. Bernstein. Stream reasoning: A survey and outlook. Data Science, 1(1--2):59--83, 2017.
[42]
A. Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. White. Cayuga: A general purpose event monitoring system. In Conference on Innovative Data Systems Research (CIDR), pages 412--422, 2007.
[43]
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML. Computer Networks, 31(11):1155--1169, 1999.
[44]
Y. Diao, P. M. Fischer, M. J. Franklin, and R. To. YFilter: Efficient and scalable filtering of XML documents. In Demo at International Conference on Data Engineering (ICDE-Demo), pages 341--342, 2002.
[45]
Esper. Esper open source software for streaming analytics, 2018. http://www.espertech.com/esper/ (Retrieved June 2018).
[46]
O. Etzion, F. Fournier, I. Skarbovsky, and B. von Halle. A model driven approach for event processing applications. In Conference on Distributed Event-Based Systems (DEBS), pages 81--92, 2016.
[47]
B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. SPADE: The System S declarative stream processing engine. In International Conference on Management of Data (SIGMOD), pages 1123--1134, 2008.
[48]
B. Gedik, S. Schneider, M. Hirzel, and K.-L. Wu. Elastic scaling for data stream processing. Transactions on Parallel and Distributed Systems (TPDS), 25(6):1447--1463, June 2014.
[49]
M. Hirzel. Partition and compose: Parallel complex event processing. In Conference on Distributed Event-Based Systems (DEBS), pages 191--200, 2012.
[50]
M. Hirzel, H. Andrade, B. Gedik, V. Kumar, G. Losa, M. Mendell, H. Nasgaard, R. Soul´e, and K.-L. Wu. SPL Streams Processing Language Specification. Technical Report RC24897, IBM Research, 2009.
[51]
M. Hirzel, S. Schneider, and B. Gedik. SPL: An extensible language for distributed stream processing. Transactions on Programming Languages and Systems (TOPLAS), 39(1):5:1--5:39, March 2017.
[52]
P. Hudak. Modular domain specific languages and tools. In International Conference on Software Reuse (ICSR), pages 134--142, 1998.
[53]
H. V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi. Big data and its technical challenges. Communications of the ACM (CACM), 57(7):86--94, July 2014.
[54]
N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Cetintemel, M. Cherniack, R. Tibbets, and S. Zdonik. Towards a streaming SQL standard. In Conference on Very Large Data Bases (VLDB), pages 1379--1390, 2008.
[55]
C. S. Jensen and R. Snodgrass. Temporal specialization and generalization. Transactions on Knowledge and Data Engineering (TKDE), 6(6):954--974, 1994.
[56]
W. M. Johnston, J. R. P. Hanna, and R. J. Millar. Advances in dataflow programming languages. ACM Computing Surveys (CSUR), 36(1):1--34, 2004.
[57]
S. Komazec, D. Cerri, and D. Fensel. Sparkwave: continuous schema-enhanced pattern matching over RDF data streams. In Conference on Distributed Event-Based Systems (DEBS), pages 58--68, 2012.
[58]
J. Kreps. Questioning the lambda architecture, 2014. http://radar.oreilly.com/2014/07/ questioning-the-lambda-architecture.html (Retrieved June 2018).
[59]
S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter Heron: Stream processing at scale. In International Conference on Management of Data (SIGMOD), pages 239--250, May 2015.
[60]
P. Le Guernic, T. Gautier, M. Le Borgne, and C. Le Maire. Programming real-time applications with Signal. Proceedings of the IEEE, 79(9):1321--1336, 1991.
[61]
D. Le-Phuoc, M. Dao-Tran, M.-D. Pham, P. Boncz, T. Eiter, and M. Fink. Linked stream data processing engines: Facts and figures. In International Semantic Web Conference (ISWC), pages 300--312, 2012.
[62]
F. L´ecu´e and J. Z. Pan. Predicting knowledge in an ontology stream. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2662--2669, 2013.
[63]
M. Lenzerini. Data integration: A theoretical perspective. In Symposium on Principles of Database Systems (PODS), pages 233--246, 2002.
[64]
J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: A new architecture for high-performance stream systems. In Conference on Very Large Data Bases (VLDB), pages 274--288, 2008.
[65]
M. H. Linehan, S. Dehors, E. Rabinovich, and F. Fournier. Controlled English language for production and event processing rules. In Conference on Distributed Event-Based Systems (DEBS), pages 149--158, 2011.
[66]
E. Meijer, B. Beckman, and G. M. Bierman. LINQ: Reconciling objects, relations, and XML in the .NET framework. In International Conference on Management of Data (SIGMOD), pages 706--706, 2006.
[67]
M. Mendell, H. Nasgaard, E. Bouillet, M. Hirzel, and B. Gedik. Extending a general-purpose streaming system for XML. In Conference on Extending Database Technology (EDBT), pages 534--539, 2012.
[68]
B. Motik, Y. Nenov, R. E. F. Piro, and I. Horrocks. Incremental update of datalog materialisation: the backward/forward algorithm. In Conference on Artificial Intelligence (AAAI), pages 1560--1568, 2015.
[69]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In Symposium on Operating Systems Principles (SOSP), pages 439--455, Nov. 2013.
[70]
S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2):117--236, 2005.
[71]
F. Peng and S. S. Chawathe. XPath queries on streaming data. In International Conference on Management of Data (SIGMOD), pages 431--442, 2003.
[72]
A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati. Linking data to ontologies. Journal of Data Semantics, 10:133--173, 2008.
[73]
M. Pouzet. Lucid synchrone, version 3, Tutorial and reference manual, 2006.
[74]
Y. Ren and J. Z. Pan. Optimising ontology stream reasoning with truth maintenance system. In Conference on Information and Knowledge Management (CIKM), pages 831--836, 2011.
[75]
A. V. Riabov, E. Bouillet, M. D. Feblowitz, Z. Liu, and A. Ranganathan. Wishful search: Interactive composition of data mashups. In International Conference on World Wide Web (WWW), pages 775--784, 2008.
[76]
N. Seyfer, R. Tibbetts, and N. Mishkin. Capture fields: Modularity in a stream-relational event processing language. In Conference on Distributed Event-Based Systems (DEBS), pages 15--22, 2011.
[77]
R. Soul´e, M. Hirzel, B. Gedik, and R. Grimm. River: An intermediate language for stream processing. Software -- Practice and Experience, 46(7):891--929, July 2016.
[78]
R. Soul´e, M. Hirzel, R. Grimm, B. Gedik, H. Andrade, V. Kumar, and K.-L. Wu. A universal calculus for stream processing languages. In European Symposium on Programming (ESOP), pages 507--528, 2010.
[79]
R. Stephens. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.
[80]
W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Hoffmann, M. Brown, and S. Amarasinghe. StreamIt: A compiler for streaming applications. Technical Report LCS-TM-622, MIT, 2002.
[81]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm @Twitter. In International Conference on Management of Data (SIGMOD), pages 147--156, June 2014.
[82]
E. D. Valle, D. Dell'Aglio, and A. Margara. Taming velocity and variety simultaneously in big data with stream reasoning. In Conference on Distributed Event-Based Systems (DEBS) Tutorial, pages 394--401, 2016.
[83]
A. van Wijngaarden, B. Mailloux, J. Peck, C. Koster, M. Sintzoff, C. Lindsey, L. Meertens, and R. Fisker. Revised Report on the Algorithmic Language ALGOL 68. 1975.
[84]
M. Vaziri, O. Tardieu, R. Rabbah, P. Suter, and M. Hirzel. Stream processing with a spreadsheet. In European Conference on Object-Oriented Programming (ECOOP), pages 360--384, 2014.
[85]
S. Wasserkrug, A. Gal, O. Etzion, and Y. Turchin. Complex event processing over uncertain data. In Conference on Distributed Event-Based Systems (DEBS), pages 253--264, 2008.
[86]
E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In International Conference on Management of Data (SIGMOD), pages 407--418, 2006.
[87]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Symposium on Operating Systems Principles (SOSP), pages 423--438, Nov. 2013.
[88]
F. Zemke, A. Witkowski, M. Cherniak, and L. Colby. Pattern matching in sequences of rows. Technical report, ANSI Standard Proposal, 2007.
[89]
Q. Zou, H. Wang, R. Soul´e, M. Hirzel, H. Andrade, B. Gedik, and K.-L. Wu. From a stream of relational queries to distributed stream processing. In Conference on Very Large Data Bases (VLDB) Industrial Track, pages 1394--1405, 2010.

Cited By

View all
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2024)Data-Aware Adaptive Compression for Stream ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337771036:9(4531-4549)Online publication date: 1-Sep-2024
  • (2023)Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES)10.1109/NILES59815.2023.10296717(23-28)Online publication date: 21-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 47, Issue 2
June 2018
68 pages
ISSN:0163-5808
DOI:10.1145/3299887
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2018
Published in SIGMOD Volume 47, Issue 2

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2024)Data-Aware Adaptive Compression for Stream ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337771036:9(4531-4549)Online publication date: 1-Sep-2024
  • (2023)Keyed Watermarks: A Fine-grained Tracking of Event-time in Apache Flink2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES)10.1109/NILES59815.2023.10296717(23-28)Online publication date: 21-Oct-2023
  • (2023)CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00038(408-422)Online publication date: Apr-2023
  • (2023)Big Data Analytics from the Rich Cloud to the Frugal Edge2023 IEEE International Conference on Edge Computing and Communications (EDGE)10.1109/EDGE60047.2023.00054(319-329)Online publication date: Jul-2023
  • (2023)Formally specifying and coinductive approach to verifying synthesis of stream calculus-based computing big data in livestreamInternet of Things10.1016/j.iot.2023.10087823(100878)Online publication date: Oct-2023
  • (2023)A survey on the evolution of stream processing systemsThe VLDB Journal10.1007/s00778-023-00819-833:2(507-541)Online publication date: 22-Nov-2023
  • (2023)Formalizing Stream Reasoning for a Decentralized Semantic WebThe Semantic Web: ESWC 2023 Satellite Events10.1007/978-3-031-43458-7_46(277-287)Online publication date: 28-May-2023
  • (2022)Examining the Nexus between the Vs of Big Data and the Sustainable Challenges in the Textile IndustrySustainability10.3390/su1408463814:8(4638)Online publication date: 13-Apr-2022
  • (2022)Online correlation for unlabeled process eventsInformation Systems10.1016/j.is.2022.102031108:COnline publication date: 1-Sep-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media