|
ABSTRACT
Protein interaction networks are one of the most promising types of biological data for the discovery of functional modules and the prediction of individual protein functions. However, it is known that these networks are both incomplete and inaccurate, i.e., they have spurious edges and lackbiologically valid edges. One way to handle this problem is by transforming the original interaction graph into new graphs that remove spurious edges, add biologically valid ones, and assign reliability scores to the edges constituting the final network. We investigate currently existing methods, as well as propose a robust association analysis-based method for this task. This method is based on the concept of h-confidence, which is a measure that can be used to extract groups of objects having high similarity with each other. Experimental evaluation on several protein interaction data sets show that hyperclique-based transformations enhance the performance of standard function prediction algorithms significantly, and thus have merit.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
Rakesh Agrawal , Heikki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
| |
3
|
|
| |
4
|
B.-J. Breitkreutz, C. Stark, and M. Tyers. The GRID: the General
Repository for Interaction Datasets. Genome Biology, 4(3):R23,
2003.
|
| |
5
|
C. Brun, C. Herrmann, and A. Guenoche. Clustering proteins from
interaction networks for the prediction of cellular functions. BMC
Bioinformatics, 5:95, 2004.
|
| |
6
|
C. M. Deane, L. Salwinski, I. Xenarios, and D. Eisenberg. Protein
interactions: two methods for assessment of the reliability of high
throughput observations. Mol Cell Proteomics, 1(5):349-356, 2002.
|
| |
7
|
M. Deng, F. Sun, and T. Chen. Assessment of the reliability of
protein-protein interactions and protein function prediction. In Pac
Symp Biocomputing, pages 140-151, 2003.
|
| |
8
|
L. Ertoz, M. Steinbach, and V. Kumar. Finding clusters of different
sizes, shapes, and densities in noisy, high dimensional data. In Proc.
SIAM International Conference on Data Mining, 2003.
|
| |
9
|
A.-C. Gavin et al. Functional organization of the yeast proteome by
systematic analysis of protein complexes. Nature,
415(6868):141-147, 2002.
|
| |
10
|
N.J. Krogan et al. Global landscape of protein complexes in the yeast
Saccharomyces cerevisiae. Nature, 440:637-643, 2006.
|
| |
11
|
R. Gupta, T. Garg, G. Pandey, M. Steinbach, and V. Kumar.
Comparative study of various genomic data sets for protein function
prediction and enhancements using association analysis. In SIAM
Workshop on Data Mining for Biomedical Informatics, 2007.
|
| |
12
|
T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki. A
comprehensive two-hybrid analysis to explore the yeast protein
interactome. PNAS, 98(8):4569-4574, 2001.
|
| |
13
|
|
| |
14
|
P. Legrain, J. Wojcik, and J.-M. Gauthier. Protein-protein interaction
maps: a lead towards cellular functions. Trends in Genetics,
17(6):346-352, 2001.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
G. Pandey, V. Kumar, and M. Steinbach. Computational approaches
for protein function prediction: A survey. Technical Report 06-028, Department of Computer Science and University of Minnesota,
October 2006.
|
| |
19
|
|
| |
20
|
J. B. Pereira-Leal, A. J. Enright, and C. A. Ouzounis. Detection of
functional modules from protein interaction networks. Proteins,
54(1):49-57, 2003.
|
| |
22
|
A. Ruepp et al. The FunCat, a functional annotation scheme for
systematic classification of proteins from whole genomes. Nucleic
Acids Research, 32(18):5539-5545, 2004.
|
| |
23
|
L. Salwinski and D. Eisenberg. Computational methods of analysis
of protein-protein interactions. Curr Opin Struct Biology,
13(3):377-382, 2003.
|
| |
24
|
M. P. Samanta and S. Liang. Predicting protein functions from
redundancies in large-scale protein interaction networks. Proc Natl
Acad Sci U.S.A., 100(22):12579-12583, 2003.
|
| |
25
|
M. P. Samanta and S. Liang. Predicting protein functions from
redundancies in large-scale protein interaction networks. PNAS,
100(22):12579-12583, 2003.
|
| |
26
|
B. Schwikowski, P. Uetz, and S. Fields. A network of protein-protein
interactions in yeast. Nature Biotechnology, 18(12):1257-1261,
2000.
|
 |
27
|
Michael Steinbach , Pang-Ning Tan , Hui Xiong , Vipin Kumar, Generalizing the notion of support, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014141]
|
| |
28
|
S. Sun, Y. Zhao, Y. Jiao, Y. Yin, L. Cai, Y. Zhang, H. Lu, R. Chena,
and D. Bu. Faster and more accurate global protein function
assignment from protein interaction networks using the MFGO
algorithm. FEBS Letters, 580(7):1891-1896, 2006.
|
| |
29
|
|
| |
30
|
P. Uetz et al. A comprehensive analysis of protein-protein
interactions in Saccharomyces cerevisiae. Nature,
403(6770):623-627, 2000.
|
| |
31
|
A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Global
protein function prediction from protein-protein interaction
networks. Nat Biotechnology, 21(6):697-700, 2003.
|
| |
32
|
D. B. West. Introduction to Graph Theory. Prentice Hall, 2001.
|
| |
33
|
I. Xenarios and D. Eisenberg. Protein interaction databases. Curr
Opin Biotechnology, 12(4):334-339, 2001.
|
| |
34
|
I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S.-M. Kim, and
D. Eisenberg. DIP, the Database of Interacting Proteins: a research
tool for studying cellular networks of protein interactions. Nucleic
Acids Research, 30(1):303-305, 2002.
|
| |
35
|
H. Xiong, X. He, C. Ding, Y. Zhang, V. Kumar, and S. R. Holbrook.
Identification of functional modules in protein complexes via
hyperclique pattern discovery. In Proc. Pacific Symposium on
Biocomputing (PSB), pages 221-232, 2005.
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
|