skip to main content
10.1145/1891903.1891967acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Modelling and analyzing multimodal dyadic interactions using social networks

Published: 08 November 2010 Publication History

Abstract

Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The results are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network.

References

[1]
Blogging heads: New York Times public opinion blog. http://video.nytimes.com/.
[2]
Speech feature extraction library. http://groupmedia.media.mit.edu/.
[3]
M. Ahuja, D. Galletta, and K. Carley. Individual centrality and performance in virtual r&d groups. Management Science, 49(1):21--38, 2003.
[4]
J. Alon, V. Athistos, Q. Yuan, and S. Sclaroff. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9):1685--1699, 2009.
[5]
S. Basu, T. Choudhury, B. Clarkson, and A. Pentland. Learning human interactions with the influence model. In Tech Report 539, MIT Media Lab, June 2001.
[6]
M. Chau and J. Xu. Mining communities and their relationships in blogs: A study of online hate groups. Int'l. Journal of Human-Computer Studies, 65:57--70, 2007.
[7]
A. Chind and M. Chignell. A social hypertext model for finding community in blogs. In Proc. of the seventeenth conference on Hypertext and hypermedia, pages 11--22, Odense, Denmark, 2006.
[8]
T. Choudhury. Sensing and modeling human networks. MIT Media Lab, 2003. PhD thesis.
[9]
W. W. Cohen and V. R. de Carvalho. Stacked sequential learning. In Proc. of IJCAI 2005, pages 671--676, 2005.
[10]
N. Dalai and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of CVPR 2005, pages 886--893, San Diego, USA, 2005.
[11]
P. Deléglise, Y. Estève, S. Meignier, and T. Merlin. The LIUM speech transcription system: a CMU sphinx iii-based system for french broadcast news. In Proc. of INTERSPEECH'05, pages 1653--1656, Lisbon, Portugal, September 2005.
[12]
T. G. Dietterich. Machine learning for sequential data: A review. In Proc. on Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, pages 15--30, 2002.
[13]
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. In The annals of statistics, pages 337--374, 1998.
[14]
M. Jones and P. Viola. Robust real-time face detection. In International Journal of Computer Vision, volume 57, pages 137--154, 2004.
[15]
S. Meignier, D. Moraru, C. Fredouille, J.-F. Bonastre, and L. Besacier. Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech and Language, (2--3):303--330, 2006.
[16]
A. Pentland and A. Madan. Perception of social interest. In ICCV, Workshop on Modeling People and Human Interaction (ICCV-PHI), Beijing, China, October 2005.
[17]
H. Salamin, S. Favre, and A. Vinciarelli. Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Trans. on Multimedia, 11(7):1373--1380, 2009.
[18]
S. Wasserman and K. Faust. Social network analysis: Methods and applications. New York, 1994. Cambridge University Press.
[19]
C.-Y. Weng, W.-T. Chu, and J.-L. Wu. Movies analysis based on roles social network. In Proc. of Int'l. Conference on Multimedia and Expo (ICME), pages 1403--1406, Beijing, China, 2007.

Cited By

View all
  • (2016)Towards Design of High-Level Synthetic Sensors for Socially-Competent Computing SystemsRevolutionizing Education through Web-Based Instruction10.4018/978-1-4666-9932-8.ch002(20-34)Online publication date: 2016
  • (2012)Socially-Competent Computing Implementing Social Sensor DesignInternational Journal of Web-Based Learning and Teaching Technologies10.4018/jwltt.20120701047:3(61-70)Online publication date: 1-Jul-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
November 2010
311 pages
ISBN:9781450304146
DOI:10.1145/1891903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. application
  2. influence model
  3. multimodal fusion
  4. social interaction
  5. social network analysis

Qualifiers

  • Research-article

Funding Sources

Conference

ICMI-MLMI '10
Sponsor:

Acceptance Rates

ICMI-MLMI '10 Paper Acceptance Rate 41 of 100 submissions, 41%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Towards Design of High-Level Synthetic Sensors for Socially-Competent Computing SystemsRevolutionizing Education through Web-Based Instruction10.4018/978-1-4666-9932-8.ch002(20-34)Online publication date: 2016
  • (2012)Socially-Competent Computing Implementing Social Sensor DesignInternational Journal of Web-Based Learning and Teaching Technologies10.4018/jwltt.20120701047:3(61-70)Online publication date: 1-Jul-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media