skip to main content
10.1145/2538542.2538566acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Active data: a data-centric approach to data life-cycle management

Published: 17 November 2013 Publication History

Abstract

Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. In this paper, we argue that data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task centric approach. We propose Active Data, a fundamentally novel paradigm for data life cycle management. Active Data follows two principles: data-centric and event-driven. We report on the Active Data programming model and its preliminary implementation, and discuss the benefits and limitations of the approach on recognized challenging data-intensive science use-cases.

References

[1]
Y. Wang, F. D. Carlo, D. C. Mancini, I. McNulty, et al., A High-Throughput X-ray Microtomography System at the Advanced Photon Source. Review of Scientific Instruments, 2001. 72(4): p. 2062-2068.
[2]
Globus Online. 2013 {cited; https://www.globusonline.org/.
[3]
T. Murata, Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 1989.
[4]
Colored Petri nets tools (CPN Tools). 2012 {cited; http://cpntools.org/.
[5]
A. Simonet, G. Fedak, and M. Ripeanu, Active Data: A Programming Model for Managing Big Data Life Cycle. INRIA Technical Report No. RR-8062, 2012.
[6]
P. Eugster, P. Felber, R. Guerraoui, and A.-M. Kermarrec, The many faces of publish/subscribe. ACM Computing Surveys, 2003. 35: p. 114-131.
[7]
A. Rajasekar, R. Moore, C.-y. Hou, C. A. Lee, et al., iRODS Primer: integrated Rule-Oriented Data System. Synthesis Lectures on Information Concepts, Retrieval, and Services. 2010 Morgan and Claypool Publishers.
[8]
T. v. Eicken, D. Culler, S. Goldstein, and K. Schauser, Active Messages: A Mechanism for Integrated Communication and Computation, in Proceedings of the 19th International Symposium on Computer Architecture. 1992. p. 256--266.
[9]
A. Acharya, M. Uysal, and J. Saltz. Active disks: programming model, algorithms and evaluation. in International conference on Architectural support for programming languages and operating systems (ASPLOS). 1998. New York, NY, USA.
[10]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. in USENIX Symposium on Operating Systems Design and Implementation (OSDI). 2004.
[11]
M. Isard, M. Budiu, Y. Yu, A. Birrell, et al., Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. SIGOPS European Conferenceon Computer Systems (EuroSys), 2007.
[12]
J. Bulosan, D. Thain, and P. J. Flynn. All-pairs: An abstraction for data-intensive cloud computing. in International Symposium on Parallel and Distributed Processing. 2008.
[13]
M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, et al., Swift: A language for distributed parallel scripting. Parallel Computing, 2011.
[14]
J. Dean and S. Ghemawatta. Pig latin: a not-so-foreign language for data processing. in SIGMOD international conference on Management of data. 2008.
[15]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, et al. Twister: a runtime for iterative mapreduce. in International Symposium on High Performance Distributed Computing (HPDC). 2010.
[16]
D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. in USENIX conference on Operating systems design and implementation (OSDI). 2010.
[17]
O. Kao, B. Lohrmann, and D. Warneke. Massively-parallel stream processing under QoS constraints with nephele. in Symposium on High PErformance Parallel and Distributed Computing (HPDC). 2012.
[18]
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, et al. (2010) Mapreduce online. USENIX conference on Networked systems design and implementation (NSDI),
[19]
I. Foster, J. Voeckler, M. Wilde, and Y. Zhao. Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation. in 14th Conference on Scientific and Statistical Database Management. 2002.
[20]
G. Fedak, H. He, and F. Cappello. BitDew: a programmable environment for large-scale data management and distribution. in International Conference on High Performance Networking and Computing (Supercomputing). 2008.
[21]
S. Al-Kiswany, A. Gharaibeh, and M. Ripeanu. The Case for Versatile Storage System. in Workshop on Hot Topics in Storage and File Systems (HotStorage). 2009.
[22]
H. He, G. Fedak, P. Kacsuk, Z. Farkas, et al. Extending the EGEE Grid with XtremWeb-HEP Desktop Grids. in Workshop on Desktop Grids and Volunteer Computing Systems. 2010.
[23]
S. Ratnasamy, M. Handley, R. Karp, and S. Shenker. Topologically-Aware Overlay Construction and Server Selection. in INFOCOM'02. 2002. New York.

Cited By

View all
  • (2023)The Improvement of Retargeting by Big Data: A Decision Support that Threatens the Brand Image?European Journal of Marketing and Economics10.26417/511ybh24h4:1(31-44)Online publication date: 7-Mar-2023
  • (2019)An Analysis Workflow-Aware Storage System for Multi-Core Active Flash ArraysIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286547130:2(271-285)Online publication date: 1-Feb-2019
  • (2015)Using Active Data to Provide Smart Data Surveillance to E-Science UsersProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.76(269-273)Online publication date: 4-Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PDSW '13: Proceedings of the 8th Parallel Data Storage Workshop
November 2013
55 pages
ISBN:9781450325059
DOI:10.1145/2538542
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data management
  2. distributed storage system

Qualifiers

  • Research-article

Conference

SC13

Acceptance Rates

PDSW '13 Paper Acceptance Rate 8 of 16 submissions, 50%;
Overall Acceptance Rate 17 of 41 submissions, 41%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The Improvement of Retargeting by Big Data: A Decision Support that Threatens the Brand Image?European Journal of Marketing and Economics10.26417/511ybh24h4:1(31-44)Online publication date: 7-Mar-2023
  • (2019)An Analysis Workflow-Aware Storage System for Multi-Core Active Flash ArraysIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286547130:2(271-285)Online publication date: 1-Feb-2019
  • (2015)Using Active Data to Provide Smart Data Surveillance to E-Science UsersProceedings of the 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing10.1109/PDP.2015.76(269-273)Online publication date: 4-Mar-2015
  • (2015)Active DataFuture Generation Computer Systems10.1016/j.future.2015.05.01553:C(25-42)Online publication date: 1-Dec-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media