skip to main content
10.1145/1027933.1027976acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections

MacVisSTA: a system for multimodal analysis

Published: 13 October 2004 Publication History


The study of embodied communication requires access to mul-tiple data sources such as multistream video and audio, various derived and meta-data such as gesture, head, posture, facial expression and gaze information. The common element that runs through these data is the co-temporality of the multiple modes of behavior. In this paper, we present the multimedia Visualization for Situated Temporal Analysis (MacVisSTA) system for the analysis of multimodal human communication through video, audio, speech transcriptions, and gesture and head orientation data. The system uses a multiple linked representation strategy in which different rep-resentations are linked by the current time focus. In this framework, the multiple display components associated with the disparate data types are kept in synchrony, each compo-nent serving as both a controller of the system as well as a display. Hence the user is able to analyze and manipulate the data from different analytical viewpoints (e.g. through the time-synchronized speech transcription or through motion segments of interest). MacVisSTA supports analysis of the synchronized data at varying timescales. It provides an annotation interface that permits users to code the data into 'music-score' objects, and to make and organize multimedia observa-tions about the data. Hence MacVisSTA integrates flexible visualization with annotation within a single framework. An XML database manager has been created for storage and search of annotation data. We compare the system with other existing annotation tools with respect to functionality and interface design. The software runs on Macintosh OS X computer systems.


McNeill, D., Hand and Mind: What Gestures Reveal about thought. 1992, Chicago: University of Chicago Press.
Kendon, A., Gesticulation and speech: Two apsects of the process of utterance, in Relationship Between Verbal and Nonverbal Communication, M.R. Key, Editor. 1980: The Hague. p. 207--227.
Kozma, R.B., A Reply: Media and Methods. Educational Technology Research and Development, 1994. 42(3): p. 1--14.
Kozma, R.B., et al., The Use of Multiple, Linked Representations to Facilitate Science Understanding, in International Perspectives on the Design of Technology-Supported Learning Environments, S. Vosniadou, et al., Editors. 1996: Mahwah, New Jersey.
Kipp, M., Anvil: Annotation of Video and Spoken Language. 2003.
Neidle, C., SignStream™: A Database Tool for Research on Visual-Gestural Language. Sign Transcription and Database Storage of Sign Information, a special issue of Sign Language and Linguistics, 2002. 4(1/2): p. 203--214.
Sanderson, P.M., et al., MacSHAPA and the enterprise of Exploratory Sequential Data Analysis (ESDA). International Journal of Human-Computer Studies, 1994. 41: p. 633--668.
Nivre, J., et al. Towards Multimodal Spoken Language Corpora: TransTool and SyncTool. in Proceedings of the Workshop on Partially Automated Techniques for Transcribing Naturally Occurring Speech at COLING-ACL '98. 1998. Montreal, Canada.
Hanke, T. and S. Prillwitz. SyncWRITER: Integrating Video into the Transcription and Analysis of Sign Language. in Proceedings of the Fourth European Congress on Sign Language Research. 1994. Munich, Germany.
CHILDES, Using CLAN (Manual available for download). 2003, CHILDES Project, Carnegie Mellon University.
Wittenberg, P., MediaTagger. 2000, Max Planck Institute for Psycholinguistics.
Dybkjaer, L., et al., Survey of Existing Tools, Standards and User Needs for Annotation of Natural Interaction and Multimodal Data. 2001, International Standards for Language Engineering, Natural Interaction and MultiModality Project: Odense, Denmark. p. 1--111.
Bigbee, T., D. Loehr, and L. Harper. Emerging Requirements for Multi-Modal Annotation and Analysis Tools. in In Proceedings, Eurospeech 2001 Special Event: Existing and Future Corpora -- Acoustic, Linguistic, and Multi-modal Requirements. 2001.
Quek, F., et al. VisSTA: A Tool for Analyzing Multimodal Discourse Data. in Seventh International Conference on Spoken Language Processing. 2002. Denver, CO.
Quek, F., et al., A multimedia database system for temporally situated perceptual psycholinguistic analysis. Multimedia Tools and Applications, 2002. 18(2): p. 91--113.

Cited By

View all
  • (2018)EASELProceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3172944.3173003(595-599)Online publication date: 5-Mar-2018
  • (2015)OudjatInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2015.05.01083:C(51-61)Online publication date: 1-Nov-2015
  • (2013)A framework for multimodal data collection, visualization, annotation and learningProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2531751(67-68)Online publication date: 9-Dec-2013
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
October 2004
368 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004


Request permissions for this article.

Check for updates

Author Tags

  1. embodied communication
  2. flexible visualization and annotation
  3. gesture
  4. multimodal interaction
  5. multiple linked representation


  • Article



Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics


Cited By

View all
  • (2018)EASELProceedings of the 23rd International Conference on Intelligent User Interfaces10.1145/3172944.3173003(595-599)Online publication date: 5-Mar-2018
  • (2015)OudjatInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2015.05.01083:C(51-61)Online publication date: 1-Nov-2015
  • (2013)A framework for multimodal data collection, visualization, annotation and learningProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2531751(67-68)Online publication date: 9-Dec-2013
  • (2013)Interactive relevance search and modelingProceedings of the 15th ACM on International conference on multimodal interaction10.1145/2522848.2522889(149-156)Online publication date: 9-Dec-2013
  • (2013)Enabling the blind to see gesturesACM Transactions on Computer-Human Interaction10.1145/2442106.244211020:1(1-32)Online publication date: 11-Apr-2013
  • (2012)Structural and temporal inference search (STIS)Proceedings of the 14th ACM international conference on Multimodal interaction10.1145/2388676.2388702(101-108)Online publication date: 22-Oct-2012
  • (2012)Supporting activity modelling from activity tracesExpert Systems: The Journal of Knowledge Engineering10.1111/j.1468-0394.2011.00584.x29:3(261-275)Online publication date: 1-Jul-2012
  • (2012)Using virtual reality technology in linguistic researchProceedings of the 2012 IEEE Virtual Reality10.1109/VR.2012.6180893(83-84)Online publication date: 4-Mar-2012
  • (2012)The Haptic Deictic System—HDSIEEE Transactions on Haptics10.1109/TOH.2011.355:2(172-183)Online publication date: 1-Jan-2012
  • (2012)ReferencesMultimedia Information Extraction10.1002/9781118219546.refs(425-460)Online publication date: 24-Aug-2012
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media