skip to main content
10.1145/2611286.2611298acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

JetStream: enabling high performance event streaming across cloud data-centers

Published: 26 May 2014 Publication History

Abstract

The easily-accessible computation power offered by cloud infrastructures coupled with the revolution of Big Data are expanding the scale and speed at which data analysis is performed. In their quest for finding the Value in the 3 Vs of Big Data, applications process larger data sets, within and across clouds. Enabling fast data transfers across geographically distributed sites becomes particularly important for applications which manage continuous streams of events in real time. Scientific applications (e.g. the Ocean Observatory Initiative or the ATLAS experiment) as well as commercial ones (e.g. Microsoft's Bing and Office 365 large-scale services) operate on tens of data-centers around the globe and follow similar patterns: they aggregate monitoring data, assess the QoS or run global data mining queries based on inter site event stream processing. In this paper, we propose a set of strategies for efficient transfers of events between cloud data-centers and we introduce JetStream: a prototype implementing these strategies as a high performance batch-based streaming middleware. JetStream is able to self-adapt to the streaming conditions by modeling and monitoring a set of context parameters. It further aggregates the available bandwidth by enabling multi-route streaming across cloud sites. The prototype was validated on tens of nodes from US and Europe data-centers of the Windows Azure cloud using synthetic benchmarks and with application code from the context of the Alice experiment at CERN. The results show an increase in transfer rate of 250 times over individual event streaming. Besides, introducing an adaptive transfer strategy brings an additional 25% gain. Finally, the transfer rate can further be tripled thanks to the use of multi-route streaming.

References

[1]
CERN Alice. http://alimonitor.cern.ch/map.jsp.
[2]
Cloud Computing and High-Energy Particle Physics: How ATLAS Experiment at CERN Uses Google Compute Engine in the Search for New Physics at LHC. https://developers.google.com/events/io/sessions/333315382.
[3]
GlobusOnline. https://www.globus.org.
[4]
Microsoft StreamInsight. http://technet.microsoft.com/en-us/library/ee362541.aspx.
[5]
Ocean Observatory Initiative. http://oceanobservatories.org/.
[6]
Windows Azure. http://windows.azure.com.
[7]
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. The VLDB Journal, 12(2):120--139, Aug. 2003.
[8]
L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. Spc: A distributed, scalable platform for data mining. In Proceedings of the 4th International Workshop on Data Mining Standards, Services and Platforms, DMSSP '06, pages 27--37, New York, NY, USA, 2006. ACM.
[9]
M. Arrott, A. Clemesha, C. Farcas, E. Farcas, M. Meisinger, D. Raymer, D. LaBissoniere, and K. Keahey. Cloud provisioning environment: Prototype architecture and technologies. Ocean Observatories Initiative Kick Off Meeting, September 2009.
[10]
C. Balkesen, N. Dindar, M. Wetter, and N. Tatbul. Rip: Run-based intra-query parallelism for scalable complex event processing. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, DEBS '13, pages 3--14, New York, NY, USA, 2013.
[11]
C. Balkesen, N. Tatbul, and M. T. Özsu. Adaptive input admission and management for parallel stream processing. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, DEBS '13, pages 15--26, New York, NY, USA, 2013. ACM.
[12]
A. Baptista, B. Howe, J. Freire, D. Maier, and C. T. Silva. Scientific exploration in the era of ocean observatories. Computing in Science and Engg., 10(3):53--58, May 2008.
[13]
I. Botan, G. Alonso, P. M. Fischer, D. Kossmann, and N. Tatbul. Flexible and scalable storage management for data-intensive stream processing. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT '09, pages 934--945, New York, NY, USA, 2009. ACM.
[14]
B. Chandramouli, J. Goldstein, R. Barga, M. Riedewald, and I. Santos. Accurate latency estimation in a distributed event processing system. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE '11, pages 255--266, Washington, DC, USA, 2011. IEEE Computer Society.
[15]
F. Farag, M. Hammad, and R. Alhajj. Adaptive query processing in data stream management systems under limited memory resources. In Proceedings of the 3rd Workshop on Ph.D. Students in Information and Knowledge Management, PIKM '10, pages 9--16, 2010.
[16]
I. Foster, A. Chervenak, D. Gunter, K. Keahey, R. Madduri, and R. Kettimuthu. Enabling PETASCALE Data Movement and Analysis. Scidac Review, Winter 2009.
[17]
Z. Galić, E. Mešković, K. Križanović, and M. Baranović. Oceanus: A spatio-temporal data stream system prototype. In Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS '12, pages 109--115, New York, NY, USA, 2012. ACM.
[18]
B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. Spade: The system s declarative stream processing engine. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 1123--1134, New York, NY, USA, 2008. ACM.
[19]
B. Gedik and L. Liu. Quality-aware dstributed data delivery for continuous query services. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD '06, pages 419--430, New York, NY, USA, 2006. ACM.
[20]
L. Golab and M. T. Özsu. Issues in data stream management. SIGMOD Rec., 32(2):5--14, June 2003.
[21]
Y. Gu and R. L. Grossman. Sector and sphere: The design and implementation of a high performance data cloud. Philosophical Transactions A Special Issue associated with the UK e-Science All Hands Meeting, 12.
[22]
A. Gupta, O. D. Sahin, D. Agrawal, and A. E. Abbadi. Meghdoot: Content-based publish/subscribe over p2p networks. In Proceedings of the 5th ACM/IFIP/USENIX International Conference on Middleware, Middleware '04, pages 254--273, New York, NY, USA, 2004. Springer-Verlag New York, Inc.
[23]
B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: Batched stream processing for data intensive distributed computing. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 63--74, New York, NY, USA, 2010. ACM.
[24]
Z. Hill, J. Li, M. Mao, A. Ruiz-Alvarez, and M. Humphrey. Early observations on the performance of windows azure. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pages 367--376, New York, NY, USA, 2010. ACM.
[25]
R. Immich, E. Cerqueira, and M. Curado. Adaptive video-aware fec-based mechanism with unequal error protection scheme. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, pages 981--988, New York, NY, USA, 2013. ACM.
[26]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the 2Nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys '07, pages 59--72, New York, NY, USA, 2007. ACM.
[27]
A. Ishii and T. Suzumura. Elastic stream computing with clouds. In Cloud Computing (CLOUD), 2011 IEEE International Conference on, pages 195--202, 2011.
[28]
P. Key, L. Massoulié, and D. Towsley. Path selection and multipath congestion control. Commun. ACM, 54(1):109--116, Jan. 2011.
[29]
R. Kienzler, R. Bruggmann, A. Ranganathan, and N. Tatbul. Stream as you go: The case for incremental data access and processing in the cloud. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops, ICDEW '12, pages 159--166, Washington, DC, USA, 2012. IEEE Computer Society.
[30]
T. Kosar, E. Arslan, B. Ross, and B. Zhang. Storkcloud: Data transfer scheduling and optimization as a service. In Proceedings of the 4th ACM Workshop on Scientific Cloud Computing, Science Cloud '13, pages 29--36, 2013.
[31]
C. Lal, V. Laxmi, and M. S. Gaur. A rate adaptive and multipath routing protocol to support video streaming in manets. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ICACCI '12, pages 262--268, New York, NY, USA, 2012.
[32]
N. Laoutaris, M. Sirivianos, X. Yang, and P. Rodriguez. Inter-datacenter bulk transfers with netstitcher. SIGCOMM Comput. Commun. Rev., 41(4):74--85, Aug. 2011.
[33]
I. Legrand, H. Newman, R. Voicu, C. Cirstoiu, C. Grigoras, C. Dobre, A. Muraru, A. Costan, M. Dediu, and C. Stratan. Monalisa: An agent based, dynamic service system to monitor, control and optimize distributed systems. Computer Physics Communications, 180(12):2472--2498, 2009.
[34]
M. Lindeberg, V. Goebel, and T. Plagemann. Adaptive sized windows to improve real-time health monitoring: A case study on heart attack prediction. In Proceedings of the International Conference on Multimedia Information Retrieval, MIR '10, pages 459--468, 2010.
[35]
S. Loesing, M. Hentschel, T. Kraska, and D. Kossmann. Stormy: An elastic and highly available streaming service in the cloud. In Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBT-ICDT '12, pages 55--60, New York, NY, USA, 2012. ACM.
[36]
K. G. S. Madsen, L. Su, and Y. Zhou. Grand challenge: Mapreduce-style processing of fast sensor data. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, DEBS '13, pages 313--318, New York, NY, USA, 2013. ACM.
[37]
S. Mao, X. Cheng, Y. Hou, and H. Sherali. Multiple description video multicast in wireless ad hoc networks. Mobile Networks and Applications, 11(1):63--73, 2006.
[38]
P. N. Martinaitis, C. J. Patten, and A. L. Wendelborn. Component-based stream processing "in the cloud". In Proceedings of the 2009 Workshop on Component-Based High Performance Computing, CBHPC '09, pages 16:1--16:12, New York, NY, USA, 2009. ACM.
[39]
C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley. Improving datacenter performance and robustness with multipath tcp. In Proceedings of the ACM SIGCOMM 2011 Conference, SIGCOMM '11, pages 266--277, New York, NY, USA, 2011. ACM.
[40]
M. A. Sharaf, P. K. Chrysanthis, and A. Labrinidis. Tuning qod in stream processing engines. In Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104, ADC '10, pages 103--112, Darlinghurst, Australia, Australia, 2010. Australian Computer Society, Inc.
[41]
I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw., 11(1):17--32, Feb. 2003.
[42]
M. A. Tariq, B. Koldehofe, G. G. Koch, I. Khan, and K. Rothermel. Meeting subscriber-defined qos constraints in publish/subscribe systems. Concurr. Comput.: Pract. Exper., 23(17):2140--2153, Dec. 2011.
[43]
M. A. Tariq, B. Koldehofe, G. G. Koch, and K. Rothermel. Distributed spectral cluster management: A method for building dynamic publish/subscribe systems. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, DEBS '12, pages 213--224, 2012.
[44]
M. A. Tariq, B. Koldehofe, and K. Rothermel. Efficient content-based routing with network topology inference. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, DEBS '13, pages 51--62, New York, NY, USA, 2013. ACM.
[45]
R. Tudoran, A. Costan, R. Wang, L. Bougé, and G. Antoniu. Bridging data in the clouds: An environment-aware system for geographically distributed data transfers. In Proceedings of the 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2014), CCGRID '14. IEEE Computer Society, 2014.
[46]
B. Wang, W. Wei, Z. Guo, and D. Towsley. Multipath live streaming via tcp: Scheme, performance and benefits. ACM Trans. Multimedia Comput. Commun. Appl., 5(3):25:1--25:23, Aug. 2009.
[47]
E. Yildirim and T. Kosar. Network-aware end-to-end data throughput optimization. In Proceedings of the First International Workshop on Network-aware Data Management, NDM '11, pages 21--30, 2011.
[48]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 423--438, New York, NY, USA, 2013. ACM.

Cited By

View all
  • (2021)An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images50th International Conference on Parallel Processing Workshop10.1145/3458744.3474039(1-9)Online publication date: 9-Aug-2021
  • (2021)Self‐adaptation on parallel stream processing: A systematic reviewConcurrency and Computation: Practice and Experience10.1002/cpe.675934:6Online publication date: 7-Dec-2021
  • (2020)SLA Management for Big Data Analytical Applications in CloudsACM Computing Surveys10.1145/338346453:3(1-40)Online publication date: 12-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DEBS '14: Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems
May 2014
371 pages
ISBN:9781450327374
DOI:10.1145/2611286
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. event streaming
  3. high performance data management
  4. multi data-centers

Qualifiers

  • Research-article

Funding Sources

  • INRIA - Microsoft Research Center

Conference

DEBS '14

Acceptance Rates

DEBS '14 Paper Acceptance Rate 16 of 174 submissions, 9%;
Overall Acceptance Rate 145 of 583 submissions, 25%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images50th International Conference on Parallel Processing Workshop10.1145/3458744.3474039(1-9)Online publication date: 9-Aug-2021
  • (2021)Self‐adaptation on parallel stream processing: A systematic reviewConcurrency and Computation: Practice and Experience10.1002/cpe.675934:6Online publication date: 7-Dec-2021
  • (2020)SLA Management for Big Data Analytical Applications in CloudsACM Computing Surveys10.1145/338346453:3(1-40)Online publication date: 12-Jun-2020
  • (2020)An edge-aware autonomic runtime for data streaming and in-transit processingFuture Generation Computer Systems10.1016/j.future.2020.03.037Online publication date: Apr-2020
  • (2019)Fog-Cloud Collaboration for Real-Time Streaming ApplicationsHandbook of Research on the IoT, Cloud Computing, and Wireless Network Optimization10.4018/978-1-5225-7335-7.ch007(128-147)Online publication date: 2019
  • (2019)Towards a computing continuum: Enabling edge-to-cloud integration for data-driven workflowsThe International Journal of High Performance Computing Applications10.1177/109434201987738333:6(1159-1174)Online publication date: 19-Sep-2019
  • (2019)Submarine: A subscription‐based data streaming framework for integrating large facilities and advanced cyberinfrastructureConcurrency and Computation: Practice and Experience10.1002/cpe.525632:16Online publication date: 11-Apr-2019
  • (2018)Building Science Gateway Infrastructure in the Middle of the Pacific and BeyondProceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity10.1145/3219104.3219151(1-8)Online publication date: 22-Jul-2018
  • (2018)Recent Advancements in Event ProcessingACM Computing Surveys10.1145/317043251:2(1-36)Online publication date: 13-Feb-2018
  • (2017)Reflections on Almost Two Decades of Research into Stream ProcessingProceedings of the 11th ACM International Conference on Distributed and Event-based Systems10.1145/3093742.3095110(21-23)Online publication date: 8-Jun-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media