Scalable Fast Rank-1 Dictionary Learning for fMRI Big Data Analysis

Authors:
Xiang Li

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

,
Milad Makkie

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

,
Binbin Lin

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Mojtaba Sedigh Fazli

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

,
Ian Davidson

University of California, Davis, Davis, CA, USA

University of California, Davis, Davis, CA, USA
View Profile

,
Jieping Ye

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Tianming Liu

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

,
Shannon Quinn

University of Georgia, Athens, GA, USA

University of Georgia, Athens, GA, USA
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 511–519https://doi.org/10.1145/2939672.2939730

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 511–519

ABSTRACT

It has been shown from various functional neuroimaging studies that sparsity-regularized dictionary learning could achieve superior performance in decomposing comprehensive and neuroscientifically meaningful functional networks from massive fMRI signals. However, the computational cost for solving the dictionary learning problem has been known to be very demanding, especially when dealing with large-scale data sets. Thus in this work, we propose a novel distributed rank-1 dictionary learning (D-r1DL) model and apply it for fMRI big data analysis. The model estimates one rank-1 basis vector with sparsity constraint on its loading coefficient from the input data at each learning step through alternating least squares updates. By iteratively learning the rank-1 basis and deflating the input data at each step, the model is then capable of decomposing the whole set of functional networks. We implement and parallelize the rank-1 dictionary learning algorithm using Spark engine and deployed the resilient distributed dataset (RDDs) abstracts for the data distribution and operations. Experimental results from applying the model on the Human Connectome Project (HCP) data show that the proposed D-r1DL model is efficient and scalable towards fMRI big data analytics, thus enabling data-driven neuroscientific discovery from massive fMRI big data in the future.

References

https://aws.amazon.com/ec2/Google Scholar
http://hafni.cs.uga.eduGoogle Scholar
https://spark.apache.orgGoogle Scholar
Aharon, M., Elad, M., and Bruckstein, A., 2006. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. Signal Processing, IEEE Transactions on 54, 11, 4311--4322. DOI= http://dx.doi.org/10.1109/TSP.2006.881199. Google ScholarDigital Library
Biswal, B.B. and Ulmer, J.L., 1999. Blind Source Separation of Multiple Signal Sources of fMRI Data Sets Using Independent Component Analysis. Journal of Computer Assisted Tomography 23, 2, 265--271.Google ScholarCross Ref
D'aspremont, A., Ghaoui, L.E., Jordan, M.I., and Lanckreit, G.R., 2004. A Direct Formulation for Sparse PCA Using Semidefinite Programming. In Advances in Neural Information Processing Systems. Google ScholarDigital Library
Damoiseaux, J.S., Rombouts, S.A.R.B., Barkhof, F., Scheltens, P., Stam, C.J., Smith, S.M., and Beckmann, C.F., 2006. Consistent resting-state networks across healthy subjects. Proceedings of the National Academy of Sciences of the United States of America 103, 37, 02/20/received), 13848--13853. DOI= http://dx.doi.org/10.1073/pnas.0601417103.Google Scholar
Elad, M. and Aharon, M., 2006. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. Image Processing, IEEE Transactions on 15, 12, 3736--3745. DOI= http://dx.doi.org/10.1109/TIP.2006.881969. Google ScholarDigital Library
Glasser, M.F., Sotiropoulos, S.N., Wilson, J.A., Coalson, T.S., Fischl, B., Anderson, J.L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J.R., Van Essen, D.C., and Jenkinson, M., 2013. The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage 80, 105--124. DOI= http://dx.doi.org/http://dx.doi.org/10.1016/j.neuroimage.2013.04.127.Google ScholarCross Ref
Gonzalez, J.E., Xin, R.S., Dave A., Crankshaw, D., Franklin, M.J., and Stoica, I., 2014. GraphX: graph processing in a distributed dataflow framework. In Proceedings of the Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation, USENIX Association, 2685096, 599--613. Google ScholarDigital Library
Kangjoo, L., Sungho, T., and Jong Chul, Y., 2011. A Data-Driven Sparse GLM for fMRI Analysis Using Sparse Dictionary Learning With MDL Criterion. Medical Imaging, IEEE Transactions on 30, 5, 1076--1089. DOI= http://dx.doi.org/10.1109/TMI.2010.2097275.Google Scholar
Lee, H., Battle, A., Raina, R., and NG, A.Y., 2006. Efficient sparse coding algorithms. In Advances in Neural Information Processing Systems. Google ScholarDigital Library
Lin, B., Li, Q., Sun, Q., Lai, M.-J., Davidson, I., Fan, W., and Ye, J., 2014. Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation. arXiv:1407.8147.Google Scholar
Liu, B.-D., Wang, Y.-X., Zhang, Y.-J., and Shen, B., 2013. Learning dictionary on manifolds for image classification. Pattern Recognition 46, 7, 1879--1890. DOI= http://dx.doi.org/http://dx.doi.org/10.1016/j.patcog.2012.11.018. Google ScholarDigital Library
Lv, J., Jiang, X., Li, X., Zhu, D., Zhang, S., Zhao, S., Chen, H., Zhang, T., Hu, X., Han, J., Ye, J., Guo, L., and Liu, T., 2015. Holistic atlases of functional networks and interactions reveal reciprocal organizational architecture of cortical function. Biomedical Engineering, IEEE Transactions on 62, 4, 1120--1131. DOI= http://dx.doi.org/10.1109/TBME.2014.2369495.Google ScholarCross Ref
Mackey, L.W., 2008. Deflation Methods for Sparse PCA. In Advances in Neural Information Processing Systems. Google ScholarDigital Library
Makkie, M., Zhao, S., Jiang, X., Lv, J., Zhao, Y., Ge, B., Li, X., Han, J., and Liu, T., 2015. HAFNI-enabled largescale platform for neuroimaging informatics (HELPNI). Brain Informatics 2, 4, 225--238. DOI= http://dx.doi.org/10.1007/s40708-015-0024-0.Google ScholarCross Ref
Mairal, J., Bach, F., Ponce, J., and Sapiro, G., 2010. Online Learning for Matrix Factorization and Sparse Coding. J. Mach. Learn. Res. 11, 19--60. Google ScholarDigital Library
Mennes, M., Biswal, B.B., Castellanos, F.X., and Milham, M.P., 2013. Making data sharing work: The FCP/INDI experience. NeuroImage 82, 683--691. DOI= http://dx.doi.org/http://dx.doi.org/10.1016/j.neuroimage.2012.10.064.Google ScholarCross Ref
Poldrack, R.A., Barch, D.M., Mitchell, J.P., Wager, T.D., Wagner, A.D., Devlin, J.T., Cumba, C., Koyejo, O., and Milham, M.P., 2013. Toward open sharing of task-based fMRI data: the OpenfMRI project. Frontiers in Neuroinformatics 7, 12. DOI= http://dx.doi.org/10.3389/fninf.2013.00012.Google ScholarCross Ref
Ravishankar, S. and Bresler, Y., 2011. MR Image Reconstruction From Highly Undersampled k-Space Data by Dictionary Learning. Medical Imaging, IEEE Transactions on 30, 5, 1028--1041. DOI= http://dx.doi.org/10.1109/TMI.2010.2090538.Google Scholar
Sindhwani, V. and Ghoting, A., 2012. Large-scale distributed non-negative sparse coding and sparse dictionary learning. In Proceedings of the Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining 2339610, 489--497. DOI= http://dx.doi.org/10.1145/2339530.2339610. Google ScholarDigital Library
Smith, S.M., Hyvarinen, A., Varoquaux, G., Miller, K.L., and Beckmann, C.F., 2014. Group-PCA for very large fMRI datasets. NeuroImage 101, 0, 738--749. DOI= http://dx.doi.org/http://dx.doi.org/10.1016/j.neuroimage.2014.07.051.Google ScholarCross Ref
Smith, S.M., Miller, K.L., Moeller, S., XU, J., Auerbach, E.J., Woolrich, M.W., Beckmann, C.F., Jenkinson, M., Andersson, J., Glasser, M.F., Van Essen, D.C., Feinberg, D.A., Yacoub, E.S., and Ugurbil, K., 2012. Temporally-independent functional modes of spontaneous brain activity. Proceedings of the National Academy of Sciences 109, 8, 3131--3136. DOI= http://dx.doi.org/10.1073/pnas.1121329109.Google ScholarCross Ref
Thirion, B. and Faugeras, O., 2003. Dynamical components analysis of fMRI data through kernel PCA. NeuroImage 20, 1, 34--49. DOI= http://dx.doi.org/http://dx.doi.org/10.1016/S1053--8119(03)00316--1.Google ScholarCross Ref
Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E.J., Yacoub, E., and Ugurbil, K., 2013. The WU-Minn Human Connectome Project: An overview. NeuroImage 80, 0, 62--79. DOI= http://dx.doi.org/http://dx.doi.org/10.1016/j.neuroimage.2013.05.041.Google ScholarCross Ref
Zharia, M., Chowdhury, M., Das, T., Dave, A., MA, J., Mccauley, M., Franklin, M.J., Shenker, S., and Stoica, I., 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2228301, 2--2. Google ScholarDigital Library
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., and Stoica, I., 2013. Discretized streams: fault-tolerant streaming computation at scale. In Proceedings of the Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2522737, 423--438. DOI= http://dx.doi.org/10.1145/2517349.2522737. Google ScholarDigital Library

Index Terms

Scalable Fast Rank-1 Dictionary Learning for fMRI Big Data Analysis
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Statistical parametric mapping of FMRI data using sparse dictionary learning
ISBI'10: Proceedings of the 2010 IEEE international conference on Biomedical imaging: from nano to Macro

Statistical parametric mapping (SPM) of functional magnetic resonance imaging (fMRI) uses a canonical hemodynamic response function (HRF) to construct the design matrix within the general linear model (GLM) framework. Recently, there has been many ...
Read More
Scalable machine-learning algorithms for big data analytics: a comprehensive review

Big data analytics is one of the emerging technologies as it promises to provide better insights from huge and heterogeneous data. Big data analytics involves selecting the suitable big data storage and computational framework augmented by scalable ...
Read More
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithm parallelization
distributed computation
fMRI
sparse coding
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 409
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable Fast Rank-1 Dictionary Learning for fMRI Big Data Analysis

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Statistical parametric mapping of FMRI data using sparse dictionary learning

Scalable machine-learning algorithms for big data analytics: a comprehensive review

Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture