skip to main content
10.1145/2390821.2390831acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Effective browsing of long audio recordings

Published:02 November 2012Publication History

ABSTRACT

Timeliner is a browser for long audio recordings and features that it derives from such recordings. Features can be either signal-based, like spectrograms, or model-based, like categorical classifiers. Unlike conventional audio editors, Timeliner pans and zooms smoothly across many orders of magnitude, from days-long overviews to millisecond-scale details, with zero latency, zero flicker, and low CPU load. Also, to suggest which details are worth zooming in to examine, Timeliner's agglomerative hierarchical caches propagate feature-specific details up to wider zoom levels. Because these details are not averaged away, "big data" can be browsed rapidly and effectively. Several studies demonstrate this.

References

  1. B. Arons. SpeechSkimmer: a system for interactively skimming recorded speech. ACM Transactions on Computer-Human Interaction, 4(1):3--38, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bénard, A. Bousseau, and J. Thollot. Dynamic solid textures for real-time coherent stylization. In Symposium on Interactive 3D Graphics and Games (I3D), pages 121--127. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Blow. Mipmapping, part 1. Game Developer Magazine, 8(12):13--17, Dec. 2001.Google ScholarGoogle Scholar
  4. J. Blow. Mipmapping, part 2. Game Developer Magazine, 9(1):16--19, Jan. 2002.Google ScholarGoogle Scholar
  5. D. Cohen, C. Goudeseune, and M. Hasegawa-Johnson. Efficient simultaneous multi-scale computation of FFTs. Technical Report FODAVA-09-01, NSF/DHS FODAVA-Lead: Foundations of Data and Visual Analytics, 2009.Google ScholarGoogle Scholar
  6. D. Ellis. The SPRACH project. www.icsi.berkeley.edu/~riptstyle~dpwe/projects/sp%rach, 1999.Google ScholarGoogle Scholar
  7. D. Ellis, C. Oei, C. Wooters, and P. Faerber. Quicknet. www.icsi.berkeley.edu/Speech/qn.html, 2012.Google ScholarGoogle Scholar
  8. C. Goudeseune. Timeliner. http://mickey.ifp.illinois.edu/speechWiki/index.php/ Software, 2012.Google ScholarGoogle Scholar
  9. C. Han, E. Risser, R. Ramamoorthi, and E. Grinspun. Multiscale texture synthesis. ACM Trans. Graphics, 27(3), Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hasegawa-Johnson, C. Goudeseune, J. Cole, H. Kaczmarski, H. Kim, S. King, T. Mahrt, J.-T. Huang, X. Zhuang, K.-H. Lin, H. V. Sharma, Z. Li, and T. S. Huang. Multimodal speech and audio user interfaces for K-12 outreach. In Proc. Asia-Pacific Signal and Information Processing Assn., 2011.Google ScholarGoogle Scholar
  11. P. S. Heckbert. Fundamentals of texture mapping and image warping. Master's thesis, University of California, Berkeley, June 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C.-F. Hollemeersch, B. Pieters, P. Lambert, and R. Van de Walle. A new approach to combine texture compression and filtering. The Visual Computer, 28(4):371--385, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Huijbregts. Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled. PhD thesis, University of Twente, 2008.Google ScholarGoogle Scholar
  14. K.-H. Lin, X. Zhuang, C. Goudeseune, S. King, M. Hasegawa-Johnson, and T. S. Huang. Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization. In Proc. ICASSP, pages 1--4, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Meignier and T. Merlin. LIUM SpkDiarization: an open source toolkit for diarization. In Carnegie-Mellon University Sphinx Workshop for Users and Developers (Carnegie Mellon University-SPUD), Mar. 2010.Google ScholarGoogle Scholar
  16. P. Mermelstein. Distance measures for speech recognition: Psychological and instrumental. In C. H. Chen, editor, Pattern Recognition and Artificial Intelligence, pages 374--388. Academic Press, 1976.Google ScholarGoogle Scholar
  17. Microsoft Corp. Silverlight. www.silverlight.net, 2012.Google ScholarGoogle Scholar
  18. B. Reitinger, M. Hoefler, A. Lengauer, R. Tomasi, M. Lamperter, and M. Gruber. Dragonfly: interactive visualization of huge aerial image datasets. In Proc. 21st ISPRS Congress, volume 37, pages 491--494, 2008.Google ScholarGoogle Scholar
  19. R. N. Shepard. Circularity in judgements of relative pitch. J. Acoust. Soc. Am., 36(12):2346--2353, 1964.Google ScholarGoogle ScholarCross RefCross Ref
  20. L. Williams. Pyramidal parametrics. SIGGRAPH Computer Graphics, 17(3):1--11, July 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Williams, L. Yan, X. Zhou, L. Lu, A. Centeno, L. Kuan, M. Hawrylycz, and G. Rosen. Global exploratory analysis of massive neuroimaging collections using Microsoft Live Labs Pivot and Silverlight. In Neuroinformatics: INCF Japan Node Session Abstracts, 2010.Google ScholarGoogle Scholar
  22. C. Xu and S. A. Boppart. Comparative performance analysis of time-frequency distributions for spectroscopic optical coherence tomography. In Biomedical Topical Meeting, page FH9. Optical Society of America, 2004.Google ScholarGoogle Scholar
  23. S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK Book. Cambridge University Engineering Dept., Cambridge, UK, 2002.Google ScholarGoogle Scholar
  24. X. Zhuang, X. Zhou, M. A. Hasegawa-Johnson, and T. S. Huang. Real-world acoustic event detection. Pattern Recognition Letters, 31(2):1543--1551, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Zhuang, X. Zhou, T. S. Huang, and M. Hasegawa-Johnson. Feature analysis and selection for acoustic event detection. In Proc. ICASSP, pages 17--20, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Effective browsing of long audio recordings

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            IMMPD '12: Proceedings of the 2nd ACM international workshop on Interactive multimedia on mobile and portable devices
            November 2012
            50 pages
            ISBN:9781450315951
            DOI:10.1145/2390821

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 2 November 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate7of14submissions,50%

            Upcoming Conference

            MM '24
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader