skip to main content
article

Optimal Algorithms for the Interval Location Problem with Range Constraints on Length and Average

Published: 01 April 2008 Publication History

Abstract

Let A be a sequence of n real numbers, L1 and L2 be two integers such thatL1 ≤ L2, and R1 and R2 be two real numbers such that R1 ≤ R2. An interval of A is feasible if its length is between L1 and L2 and its average is between R1 and R2. In this paper, we study the following problems: finding all feasible intervals of A, counting all feasible intervals of A, finding a maximum cardinality set of non-overlapping feasible intervals of A, locating a longest feasible interval of A, and locating a shortest feasible interval of A. The problems are motivated from the problem of locating CpG islands in biomolecular sequences. In this paper, we firstly show that all the problems have an Ω(n log n)-time lower bound in the comparison model. Then, we use geometric approaches to design optimal algorithms for the problems. All the presented algorithms run in an on-line manner and use O(n) space.

References

[1]
A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
[2]
L. Allison, "Longest Biased Interval and Longest Non-Negative Sum Interval," Bioinformatics, vol. 19, no. 10, pp. 1294-1295, 2003.
[3]
F. Antequera, "Structure, Function and Evolution of CpG Island Promoters," Cellular and Molecular Life Sciences, vol. 60, no. 8, pp. 1647-1658, 2003.
[4]
F. Antequera and A. Bird, "Number of CpG Islands and Genes in Human and Mouse," Proc. Nat'l Academy of Sciences, vol. 90, no. 24, pp. 11995-11999, 1993.
[5]
R. Bayer, "Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms," Acta Informatica, vol. 1, pp. 290-306, 1972.
[6]
R. Bayer and E.M. McCreight, "Organization and Maintenance of Large Ordered Indexes," Acta Informatica, vol. 1, pp. 173-189, 1972.
[7]
B. Chazelle, "A Functional Approach to Data Structures and Its Use in Multidimensional Searching," SIAM J. Computing, vol. 17, no. 3, pp. 427-462, 1988.
[8]
K.-Y. Chen and K.-M. Chao, "Optimal Algorithms for Locating the Longest and Shortest Segments Satisfying a Sum or an Average Constraint," Information Processing Letters, vol. 96, no. 6, pp. 197- 201, 2005.
[9]
T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. McGraw-Hill, 2001.
[10]
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.
[11]
M. Esteller, "CpG Island Hypermethylation and Tumor Suppressor Genes: A Booming Present, A Brighter Future," Oncogene, vol. 21, no. 35, pp. 5427-5440, 2002.
[12]
G.N. Frederickson and S. Rodger, "A New Approach to the Dynamic Maintenance of Maximal Points in a Plane," Discrete and Computational Geometry, vol. 5, no. 4, pp. 365-374, 1990.
[13]
T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, "Interval Finding and Its Application to Data Mining," Proc. Seventh Ann. Int'l Symp. Algorithms and Computation, pp. 55-64, 1996.
[14]
M.H. Goldwasser, M.-Y. Kao, and H.-I. Lu, "Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications," J. Computer and System Sciences, vol. 70, no. 2, pp. 128-144, 2005.
[15]
L.J. Guibas and R. Sedgewick, "A Dichromatic Framework for Balanced Trees," Proc. 19th Ann. Symp. Foundations of Computer Science, pp. 8-21, 1978.
[16]
S.-Y. Hsieh and T.-Y. Chou, "Finding a Weight-Constrained Maximum-Density Subtree in a Tree," Proc. 16th Ann. Int'l Symp. Algorithms and Computation, pp. 944-953, 2005.
[17]
X. Huang, "An Algorithm for Identifying Regions of a DNA Sequence that Satisfy a Content Requirement," Computer Applications in the Biosciences, vol. 10, no. 3, pp. 219-225, 1994.
[18]
I.P. Ioshikhes and M.Q. Zhang, "Large-Scale Human Promoter Mapping Using CpG Islands," Nature Genetics, vol. 26, pp. 61-63, 2000.
[19]
R. Janardan, "On the Dynamic Maintenance of Maximal Points in the Plane," Information Processing Letters, vol. 40, no. 2, pp. 59-64, 1991.
[20]
S. Kapoor, "Dynamic Maintenance of Maxima of 2-D Point Sets," SIAM J. Computing, vol. 29, no. 6, pp. 1858-1877, 2000.
[21]
S.K. Kim, "Linear-Time Algorithm for Finding a Maximum-Density Segment of a Sequence," Information Processing Letters, vol. 86, no. 6, pp. 339-342, 2003.
[22]
Y.-L. Lin, X. Huang, T. Jiang, and K.-M. Chao, "MAVG: Locating Non-Overlapping Maximum Average Segments in a Given Sequence," Bioinformatics, vol. 19, no. 1, pp. 151-152, 2003.
[23]
Y.-L. Lin, T. Jiang, and K.-M. Chao, "Efficient Algorithms for Locating the Length-Constrained Heaviest Segments with Applications to Biomolecular Sequences Analysis," J. Computer and System Sciences, vol. 65, no. 3, pp. 570-586, 2002.
[24]
R.-R. Lin, W.-H. Kuo, and K.-M. Chao, "Finding a Length-Constrained Maximum-Density Path in a Tree," J. Combinatorial Optimization, vol. 9, no. 2, pp. 147-156, 2005.
[25]
E.M. McCreight, "Priority Search Trees," SIAM J. Computing, vol. 14, no. 2, pp. 257-276, 1985.
[26]
A. Nekrutenko and W.-H. Li, "Assessment of Compositional Heterogeneity within and between Eukaryotic Genomes," Genome Research, vol. 10, no. 12, pp. 1986-1995, 2000.
[27]
M.H. Overmars and J. van Leeuwen, "Maintenance of Configurations in the Plane," J. Computer and System Sciences, vol. 23, no. 2, pp. 166-204, 1981.
[28]
F.P. Preparata and M.I. Shamos, Computational Geometry. Springer, 1985.
[29]
L. Scotto and R.K. Assoian, "A GC-Rich Domain with Bifunctional Effects on mRNA and Protein Levels: Implications for Control of Transforming Growth Factor Beta 1 Expression," Molecular and Cellular Biology, vol. 13, no. 6, pp. 3588-3597, 1993.
[30]
L. Wang and Y. Xu, "SEGID: Identifying Interesting Segments in (Multiple) Sequence Alignments," Bioinformatics, vol. 19, no. 2, pp. 297-298, 2003.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 2
April 2008
158 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 April 2008
Published in TCBB Volume 5, Issue 2

Author Tags

  1. algorithms
  2. analysis of algorithms
  3. data structures
  4. geometrical problems and computations

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 170
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media