ABSTRACT
Group meetings are often inefficient, unorganized and poorly documented. Factors including "group-think," fear of speaking, unfocused discussion, and bias can affect the performance of a group meeting. In order to actively or passively facilitate group meetings, automatically analyzing group interaction patterns is critical. Existing research on group dynamics analysis still heavily depends on video cameras in the lines of sight of participants or wearable sensors, both of which could affect the natural behavior of participants. In this thesis, we present a smart meeting room that combines microphones and unobtrusive ceiling-mounted Time-of-Flight (ToF) sensors to understand group dynamics in team meetings. Since the ToF sensors are ceiling-mounted and out of the lines of sight of the participants, we posit that their presence would not disrupt the natural interaction patterns of individuals. We collect a new multi-modal dataset of group interactions where participants have to complete a task by reaching a group consensus, and then fill out a post-task questionnaire. We use this dataset for the development of our algorithms and analysis of group meetings. In this paper, we combine the ceiling-mounted ToF sensors and lapel microphones to: (1) estimate the seated body orientation of participants, (2) estimate the head pose and visual focus of attention (VFOA) of meeting participants, (3) estimate the arm pose and body posture of participants, and (4) analyze the multimodal data for passive understanding of group meetings, with a focus on perceived leadership and contribution.
- S Afshari, TK Woodstock, MHT Imam, S Mishra, AC Sanderson, and RJ Radke . 2015. The Smart Conference Room: An Integrated System Testbed for Efficient, Occupancy-aware Lighting Control. In ACM Int. Conf. Embedded Syst. Energy-Efficient Built Environments. Google ScholarDigital Library
- Sileye O Ba and Jean-Marc Odobez. 2011. Multiperson visual focus of attention from head pose and meeting contextual cues. IEEE Trans. Pattern Anal. and Mach. Intell. Vol. 33, 1 (2011), 101--116. Google ScholarDigital Library
- Cigdem Beyan, Francesca Capozzi, Cristina Becchio, and Vittorio Murino. 2017. Multi-task learning of social psychology assessments and nonverbal features for automatic leadership identification. In Proc. 19th Int. Conf. Multimodal Interaction. ACM. Google ScholarDigital Library
- Cigdem Beyan, Francesca Capozzi, Cristina Becchio, and Vittorio Murino. 2018. Prediction of the leadership style of an emergent leader using audio and visual nonverbal features. IEEE Trans. Multimedia Vol. 20, 2 (2018), 441--456. Google ScholarDigital Library
- Indrani Bhattacharya, Noam Eshed, and Richard J Radke. 2017. Privacy-Preserving Understanding of Human Body Orientation for Smart Meetings Int. Conf. Comput. Vision Pattern Regognition Workshops. IEEE.Google Scholar
- Indrani Bhattacharya, Michael Foley, Ni Zhang, Tongtao Zhang, Christine Ku, Cameron Mine, Heng Ji, Christoph Riedl, Brooke F. Welles, and Richard J. Radke. 2018. A multimodal-sensor-enabled room for unobtrusive group meeting analysis Proc. Int. Conf. Multimodal Interaction. ACM. Google ScholarDigital Library
- Indrani Bhattacharya and Richard J Radke. 2016. Arrays of single pixel time-of-flight sensors for privacy preserving tracking and coarse pose estimation. In Proc. Winter Conf. Appl. Comput. Vision. IEEE.Google ScholarCross Ref
- Konstantinos Bousmalis, Marc Mehu, and Maja Pantic. 2013. Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools. Image and Vision Computing Vol. 31, 2 (2013), 203--221. Google ScholarDigital Library
- Susanne Burger, Victoria MacLaren, and Hua Yu. 2002. The ISL meeting corpus: The impact of meeting type on speech style INTERSPEECH. Denver, CO.Google Scholar
- Nick Campbell, Toshiyuki Sadanobu, Masataka Imura, Naoto Iwahashi, Suzuki Noriko, and Damien Douxchamps. 2006. A multimedia database of meetings and informal interactions for tracking participant involvement and discourse flow. In Proc. Int. Conf. Lang. Resources Evaluation. Genoa, Italy.Google Scholar
- Ming Ming Chiu and Nale Lehmann-Willenbrock . 2016. Statistical discourse analysis: Modeling sequences of individual actions during group interactions across time. Group Dynamics: Theory, Research, and Practice Vol. 20, 3 (2016), 242.Google ScholarCross Ref
- Matthew A Cronin, Laurie R Weingart, and Gergana Todorova . 2011. Dynamics in groups: Are we there yet? Academy of Management Annals Vol. 5, 1 (2011), 571--612.Google ScholarCross Ref
- Daniel Gatica-Perez, Alessandro Vinciarelli, and Jean-Marc Odobez . 2014. Nonverbal behavior analysis. In Multimodal Interactive Syst. Manage. EPFL Press, 165--187.Google Scholar
- Liuhao Ge, Hui Liang, Junsong Yuan, and Daniel Thalmann . 2018. Real-time 3D hand pose estimation with 3D Convolutional Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. (2018).Google Scholar
- Jay Hall and Wilfred Harvey Watson . 1970. The effects of a normative intervention on group decision-making performance. Human Relations Vol. 23, 4 (1970), 299--317.Google ScholarCross Ref
- Jinni A Harrigan, Thomas E Oxman, and Robert Rosenthal . 1985. Rapport expressed through nonverbal behavior. J. Nonverbal Behavior Vol. 9, 2 (1985), 95--110.Google ScholarCross Ref
- Benjamin Herndon and Kyle Lewis . 2015. Applying sequence methods to the study of team temporal dynamics. Organizational Psychology Review Vol. 5, 4 (2015), 318--332.Google ScholarCross Ref
- Hayley Hung and Daniel Gatica-Perez . 2010. Estimating cohesion in small groups using audio-visual nonverbal behavior. IEEE Trans. Multimedia Vol. 12, 6 (2010), 563--575. Google ScholarDigital Library
- Hayley Hung, Yan Huang, Gerald Friedland, and Daniel Gatica-Perez . 2011. Estimating dominance in multi-party meetings using speaker diarization. IEEE Trans. Audio, Speech, and Language Process. Vol. 19, 4 (2011), 847--860. Google ScholarDigital Library
- Umar Iqbal, Pavlo Molchanov, Thomas Breuel, Juergen Gall, and Jan Kautz . 2018. Hand pose estimation via Latent 2.5D heatmap regression. arXiv preprint arXiv:1804.09534 (2018).Google Scholar
- Adam Janin, Don Baron, Jane Edwards, Dan Ellis, David Gelbart, Nelson Morgan, Barbara Peskin, Thilo Pfau, Elizabeth Shriberg, Andreas Stolcke, et almbox. . 2003. The ICSI meeting corpus. In Int. Conf. Acoust., Speech, and Signal Process. Hong Kong, China.Google Scholar
- Dineshbabu Jayagopi, Dairazalia Sanchez-Cortes, Kazuhiro Otsuka, Junji Yamato, and Daniel Gatica-Perez . 2012. Linking speaking and looking behavior patterns with group composition, perception, and performance. In Proc. 14th ACM Int. Conf. Multimodal Interaction. ACM. Google ScholarDigital Library
- Li Jia and Richard J Radke . 2014. Using Time-of-Flight measurements for privacy-preserving tracking in a smart room. IEEE Trans. Ind. Informat. Vol. 10, 1 (2014), 689--696.Google ScholarCross Ref
- Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt . 2006. A corpus for studying addressing behaviour in multi-party dialogues. Language Resources and Evaluation Vol. 40, 1 (2006), 5--23.Google ScholarCross Ref
- Taemie Kim, Erin McFee, Daniel Olguin Olguin, Ben Waber, Alex Pentland, et almbox. . 2012. Sociometric badges: Using sensor technology to capture new forms of collaboration. J. Organizational Behavior Vol. 33, 3 (Jan. . 2012), 412--427.Google ScholarCross Ref
- Steve WJ Kozlowski . 2015. Advancing research on team process dynamics: Theoretical, methodological, and measurement considerations. Organizational Psychology Review Vol. 5, 4 (2015), 270--299.Google ScholarCross Ref
- Jin Joo Lee, Brad Knox, Jolie Baumann, Cynthia Breazeal, and David DeSteno . 2013. Computationally modeling interpersonal trust. Frontiers in Psychology Vol. 4 (2013), 893.Google ScholarCross Ref
- Roger Th AJ Leenders, Noshir S Contractor, and Leslie A DeChurch . 2016. Once upon a time: Understanding team processes as relational event networks. Organizational Psychology Review Vol. 6, 1 (2016), 92--115.Google ScholarCross Ref
- Nale Lehmann-Willenbrock, Hayley Hung, and Joann Keyton . 2017. New frontiers in analyzing dynamic group interactions: Bridging social and computer science. Small Group Research Vol. 48, 5 (2017), 519--531.Google ScholarCross Ref
- Shobhit Mathur, Marshall Scott Poole, Feniosky Pena-Mora, Mark Hasegawa-Johnson, and Noshir Contractor . 2012. Detecting interaction links in a collaborating group using manually annotated data. Social Networks Vol. 34, 4 (2012), 515--526.Google ScholarCross Ref
- Wenxuan Mou, Hatice Gunes, and Ioannis Patras . 2016. Alone versus in-a-group: A comparative analysis of facial affect recognition Proc. ACM Multimedia Conf. ACM, 521--525. Google ScholarDigital Library
- Philipp Müller, Michael Xuelin Huang, and Andreas Bulling . 2018. Detecting low rapport during natural interactions in small groups from non-Verbal behaviour. arXiv preprint arXiv:1801.06055 (2018).Google Scholar
- Markus Oberweger and Vincent Lepetit . 2017. DeepPriorGoogle Scholar
- : Improving fast and accurate 3D hand pose estimation Int. Conf. Comput. Vision Workshops, Vol. Vol. 840. 2.Google Scholar
- Catharine Oertel, Kenneth A Funes Mora, Samira Sheikhi, Jean-Marc Odobez, and Joakim Gustafson . 2014. Who will get the grant?: A multimodal corpus for the analysis of conversational behaviours in group interviews. In Proc. Workshop Understanding Modeling Multiparty, Multimodal Interactions. ACM. Google ScholarDigital Library
- Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros . 2011. Efficient model-based 3D tracking of hand articulations using Kinect. BmVC, Vol. Vol. 1. 3.Google ScholarCross Ref
- Kazuhiro Otsuka, Shoko Araki, Kentaro Ishizuka, Masakiyo Fujimoto, Martin Heinrich, and Junji Yamato . 2008. A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization. In Proc. Int. Conf. Multimodal Interfaces. ACM, Crete, Greece. Google ScholarDigital Library
- Kazuhiro Otsuka, Hiroshi Sawada, and Junji Yamato . 2007. Automatic inference of cross-Modal nonverbal interactions in multiparty conversations: Who responds to whom, when, and how? From gaze, head gestures, and utterances. In Proc. Int. Conf. Multimodal Interfaces. ACM, Aichi, Japan. Google ScholarDigital Library
- Kazuhiro Otsuka, Yoshinao Takemae, and Junji Yamato . 2005. A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In Proc. 7th Int. Conf. Multimodal Interfaces. ACM. Google ScholarDigital Library
- Kazuhiro Otsuka, Junji Yamato, Yoshinao Takemae, and Hiroshi Murase . 2006. Conversation scene analysis with dynamic Bayesian Network based on visual head tracking Proc. Int. Conf. Multimedia and Expo. IEEE, Toronto, ON, Canada.Google Scholar
- George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Murphy . 2018. PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. CoRR Vol. abs/1803.08225 (2018).Google Scholar
- George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin P. Murphy . 2017. Towards accurate multi-person pose estimation in the wild. CoRR Vol. abs/1701.01779 (2017).Google Scholar
- Dairazalia Sanchez-Cortes, Oya Aran, and Daniel Gatica-Perez . 2011. An audio visual corpus for emergent leader analysis Workshop Multimodal Corpora Mach. Learning: Taking Stock and Road Mapping the Future. Alicante, Spain.Google Scholar
- George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, and Phil Hall . 2017. English Conversational Telephone Speech Recognition by Humans and Machines Proc. INTERSPEECH.Google Scholar
- Stefan Scherer, Nadir Weibel, Louis-Philippe Morency, and Sharon Oviatt . 2012. Multimodal prediction of expertise and leadership in learning groups Int. Workshop Multimodal Learning Analytics. Google ScholarDigital Library
- Rainer Stiefelhagen, Jie Yang, and Alex Waibel . 2002. Modeling focus of attention for meeting indexing based on multiple cues. Trans. Neural Netw. Vol. 13, 4 (2002), 928--938. Google ScholarDigital Library
- Thomas J. L. van Rompay, Dorette J. Vonk, and Marieke L. Fransen . 2009. The eye of the camera: Effects of security cameras on prosocial behavior. Environment and Behavior Vol. 41, 1 (2009), 60--74.Google ScholarCross Ref
Index Terms
- Unobtrusive Analysis of Group Interactions without Cameras
Recommendations
Improved Visual Focus of Attention Estimation and Prosodic Features for Analyzing Group Interactions
ICMI '19: 2019 International Conference on Multimodal InteractionCollaborative group tasks require efficient and productive verbal and non-verbal interactions among the participants. Studying such interaction patterns could help groups perform more efficiently, but the detection and measurement of human behavior is ...
A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting Analysis
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal InteractionGroup meetings can suffer from serious problems that undermine performance, including bias, "groupthink", fear of speaking, and unfocused discussion. To better understand these issues, propose interventions, and thus improve team performance, we need to ...
The unobtrusive group interaction (UGI) corpus
MMSys '19: Proceedings of the 10th ACM Multimedia Systems ConferenceStudying group dynamics requires fine-grained spatial and temporal understanding of human behavior. Social psychologists studying human interaction patterns in face-to-face group meetings often find themselves struggling with huge volumes of data that ...
Comments