research-article

Modelling and analyzing multimodal dyadic interactions using social networks

Authors:

Sergio Escalera,

Bogdan RaducanuAuthors Info & Claims

ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

Article No.: 52, Pages 1 - 8

https://doi.org/10.1145/1891903.1891967

Published: 08 November 2010 Publication History

Abstract

Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The results are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network.

References

[1]

Blogging heads: New York Times public opinion blog. http://video.nytimes.com/.

[2]

Speech feature extraction library. http://groupmedia.media.mit.edu/.

[3]

M. Ahuja, D. Galletta, and K. Carley. Individual centrality and performance in virtual r&d groups. Management Science, 49(1):21--38, 2003.

Digital Library

[4]

J. Alon, V. Athistos, Q. Yuan, and S. Sclaroff. A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(9):1685--1699, 2009.

Digital Library

[5]

S. Basu, T. Choudhury, B. Clarkson, and A. Pentland. Learning human interactions with the influence model. In Tech Report 539, MIT Media Lab, June 2001.

[6]

M. Chau and J. Xu. Mining communities and their relationships in blogs: A study of online hate groups. Int'l. Journal of Human-Computer Studies, 65:57--70, 2007.

Digital Library

[7]

A. Chind and M. Chignell. A social hypertext model for finding community in blogs. In Proc. of the seventeenth conference on Hypertext and hypermedia, pages 11--22, Odense, Denmark, 2006.

Digital Library

[8]

T. Choudhury. Sensing and modeling human networks. MIT Media Lab, 2003. PhD thesis.

Digital Library

[9]

W. W. Cohen and V. R. de Carvalho. Stacked sequential learning. In Proc. of IJCAI 2005, pages 671--676, 2005.

Digital Library

[10]

N. Dalai and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of CVPR 2005, pages 886--893, San Diego, USA, 2005.

Digital Library

[11]

P. Deléglise, Y. Estève, S. Meignier, and T. Merlin. The LIUM speech transcription system: a CMU sphinx iii-based system for french broadcast news. In Proc. of INTERSPEECH'05, pages 1653--1656, Lisbon, Portugal, September 2005.

[12]

T. G. Dietterich. Machine learning for sequential data: A review. In Proc. on Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, pages 15--30, 2002.

Digital Library

[13]

J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. In The annals of statistics, pages 337--374, 1998.

[14]

M. Jones and P. Viola. Robust real-time face detection. In International Journal of Computer Vision, volume 57, pages 137--154, 2004.

Digital Library

[15]

S. Meignier, D. Moraru, C. Fredouille, J.-F. Bonastre, and L. Besacier. Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech and Language, (2--3):303--330, 2006.

[16]

A. Pentland and A. Madan. Perception of social interest. In ICCV, Workshop on Modeling People and Human Interaction (ICCV-PHI), Beijing, China, October 2005.

[17]

H. Salamin, S. Favre, and A. Vinciarelli. Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Trans. on Multimedia, 11(7):1373--1380, 2009.

Digital Library

[18]

S. Wasserman and K. Faust. Social network analysis: Methods and applications. New York, 1994. Cambridge University Press.

[19]

C.-Y. Weng, W.-T. Chu, and J.-L. Wu. Movies analysis based on roles social network. In Proc. of Int'l. Conference on Multimedia and Expo (ICME), pages 1403--1406, Beijing, China, 2007.

Cited By

Dimitrova M(2016)Towards Design of High-Level Synthetic Sensors for Socially-Competent Computing SystemsRevolutionizing Education through Web-Based Instruction10.4018/978-1-4666-9932-8.ch002(20-34)Online publication date: 2016
https://doi.org/10.4018/978-1-4666-9932-8.ch002
Dimitrova M(2012)Socially-Competent Computing Implementing Social Sensor DesignInternational Journal of Web-Based Learning and Teaching Technologies10.4018/jwltt.20120701047:3(61-70)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.4018/jwltt.2012070104

Recommendations

Analyzing Close Friend Interactions in Social Media
SOCIALCOM '13: Proceedings of the 2013 International Conference on Social Computing

Social media has increasingly become an outlet for expression in society. Users of online social networks often associate with many other users who are all treated as "friends, " even if they do not have a strong connection, or what would be described ...
Beyond Dyadic Interactions: Considering Chatbots as Community Members
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Chatbots have grown as a space for research and development in recent years due both to the realization of their commercial potential and to advancements in language processing that have facilitated more natural conversations. However, nearly all ...
Laughter entrainment in dyadic interactions: Temporal distribution and form
Abstract
It has been established across a wide range of communicative behaviours that conversational partners tend to become more similar during their interaction. This phenomenon, often called entrainment, has been shown to take place not only ...
Highlights
- Laughter entrainment examined across languages: French, German and Mandarin Chinese.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

November 2010

311 pages

ISBN:9781450304146

DOI:10.1145/1891903

General Chairs:
Wen Gao
PKU, China
,
Chin-Hui Lee
Georgia Tech
,
Jie Yang
Carnegie Mellon
,
Program Chairs:
Xilin Chen
ICT, CAS, China
,
Maxine Eskenazi
Carnegie Mellon
,
Zhengyou Zhang
MSR

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministerio de Educación, Cultura y Deporte

Conference

ICMI-MLMI '10

Sponsor:

SIGCHI

ICMI-MLMI '10: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES/WORKSHOP ON MACHINE LEARNING FOR MULTIMODAL INTERFACES

November 8 - 10, 2010

Beijing, China

Acceptance Rates

ICMI-MLMI '10 Paper Acceptance Rate 41 of 100 submissions, 41%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
184
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dimitrova M(2016)Towards Design of High-Level Synthetic Sensors for Socially-Competent Computing SystemsRevolutionizing Education through Web-Based Instruction10.4018/978-1-4666-9932-8.ch002(20-34)Online publication date: 2016
https://doi.org/10.4018/978-1-4666-9932-8.ch002
Dimitrova M(2012)Socially-Competent Computing Implementing Social Sensor DesignInternational Journal of Web-Based Learning and Teaching Technologies10.4018/jwltt.20120701047:3(61-70)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.4018/jwltt.2012070104

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten