skip to main content
10.1145/1647314.1647330acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Detecting, tracking and interacting with people in a public space

Published: 02 November 2009 Publication History

Abstract

We have built a system that engages naive users in an audio-visual interaction with a computer in an unconstrained public space. We combine audio source localization techniques with face detection algorithms to detect and track the user throughout a large lobby. The sensors we use are an ad-hoc microphone array and a PTZ camera. To engage the user, the PTZ camera turns and points at sounds made by people passing by. From this simple pointing of a camera, the user is made aware that the system has acknowledged their presence. To further engage the user, we develop a face classification method that identifies and then greets previously seen users. The user can interact with the system through a simple hot-spot based gesture interface. To make the user interactions with the system feel natural, we utilize reconfigurable hardware, achieving a visual response time of less than 100ms. We rely heavily on machine learning methods to make our system self-calibrating and adaptive.

References

[1]
The ucsd automatic cameraman. Google search: "UCSD Automatic Cameraman".
[2]
M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. Signal Processing, IEEE Transactions on, 50(2):174--188, Feb 2002.
[3]
G. Bradski and A. Kaehler. Learning OpenCV: Computer Vision with the OpenCV Library. O'Reilly Media, Inc., 1st edition, October 2008.
[4]
M. Brandstein and D. Ward, editors. Microphone Arrays: Signal Processing Techniques and Applications. Springer, 1st edition, June 2001.
[5]
K. Chaudhuri, Y. Freund, and D. Hsu. Tracking using explanation-based modeling. arXiv:0903.2862v1, 2009.
[6]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In In CVPR, pages 886--893, 2005.
[7]
S. Dasgupta and Y. Freund. Random projection trees and low dimensional manifolds. In STOC '08: Proceedings of the 40th annual ACM symposium on Theory of computing, pages 537--546, New York, NY, USA, 2008. ACM.
[8]
E. Ettinger and Y. Freund. Coordinate-free calibration of an acoustically driven camera pointing system. In ICDSC 2008: Second ACM/IEEE International Conference on Distributed Smart Cameras, pages 1--9, Sept. 2008.
[9]
W. Freeman, P. Beardsley, H. Kage, K. Tanaka, C. Kyuman, and C. Weissman. Computer vision for computer interaction. In ACM SIGGRAPH, 1999.
[10]
W. Freeman and M. Roth. Orientation histograms for hand gesture recognition. In Intl. Workshop on Automatic Face and Gesture Recognition, 1995.
[11]
W. Freeman and C. D. Weissman. Television control by hand gestures. In Intl. Workshop on Automatic Face and Gesture Recognition, 1995.
[12]
Y. Freund, S. Dasgupta, M. Kabra, and N. Verma. Learning the structure of manifolds using random projections. In Advances in Neural Information Processing Systems, volume 20, 2007.
[13]
Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the 13th International conference on Machine Learning, pages 148--156. Morgan Kaufmann, 1996.
[14]
Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of Japanese Society for Articial Intelligence, 14(5):771--780, Sept. 1999.
[15]
A. Jaimes and J. Liu. Hotspot components for gesture-based interaction. In INTERACT 2005, pages 1062--1066, 2005.
[16]
S. Kpotufe. Escaping the curse of dimensionality with a tree-based regressor. In COLT '09: Proceedings of the 22nd annual workshop on computational learning theory, 2009.
[17]
J. D. Mackinlay, J. D. Mackinlay, G. G. Robertson, and S. K. Card. The information visualizer: A 3d user interface for information retrieval. In Advanced Visual Interfaces, AVI, pages 173--179, 1992.
[18]
R. B. Miller. Response time in man-computer conversational transactions. In AFIPS '68 (Fall, part I): Proceedings of the December 9--11, 1968, fall joint computer conference, part I, pages 267--277, New York, NY, USA, 1968. ACM.
[19]
K. Nickel, T. Gehrig, R. Stiefelhagen, and J. McDonough. A joint particle filter for audio-visual speaker tracking. In ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces, pages 61--68, New York, NY, USA, 2005. ACM.
[20]
A. Pentland. Looking at people: Sensing for ubiquitous and wearable computing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):107--119, 2000.
[21]
S. T. Shivappa, M. M. Trivedi, and B. D. Rao. Person tracking with audio-visual cues using the iterative decoding framework. In AVSS '08: Proceedings of the 2008 IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance, pages 260--267, Washington, DC, USA, 2008. IEEE Computer Society.
[22]
L. Sirovich and M. Kirby. Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. Am. A, 4(3):519--524, 1987.
[23]
M. A. Turk and A. P. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71--86, 1991.
[24]
P. Viola and M. Jones. Robust real-time object detection. In International Journal of Computer Vision, 2001.
[25]
D. N. Zotkin, R. Duraiswami, and L. S. Davis. Joint audio-visual tracking using particle filters. EURASIP J. Appl. Signal Process., 2002(1):1154--1164, 2002.

Cited By

View all
  • (2024)Design and VLSI Implementation of Digital Image Processing Applications2024 International Conference on Science Technology Engineering and Management (ICSTEM)10.1109/ICSTEM61137.2024.10560799(1-5)Online publication date: 26-Apr-2024
  • (2022)CamFi: An AI-driven and Camera-based System for Assisting Users in Finding Lost Objects in Multi-Person ScenariosExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519780(1-7)Online publication date: 27-Apr-2022
  • (2020)Evaluation of Smart Infrastructure Systems and Novel UV-Oriented Solution for Integration, Resilience, Inclusiveness, and Sustainability2020 5th International Conference on Universal Village (UV)10.1109/UV50937.2020.9426194(1-45)Online publication date: 24-Oct-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
November 2009
374 pages
ISBN:9781605587721
DOI:10.1145/1647314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. boosting
  2. machine learning
  3. real-time hardware.
  4. tracking

Qualifiers

  • Poster

Conference

ICMI-MLMI '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Design and VLSI Implementation of Digital Image Processing Applications2024 International Conference on Science Technology Engineering and Management (ICSTEM)10.1109/ICSTEM61137.2024.10560799(1-5)Online publication date: 26-Apr-2024
  • (2022)CamFi: An AI-driven and Camera-based System for Assisting Users in Finding Lost Objects in Multi-Person ScenariosExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519780(1-7)Online publication date: 27-Apr-2022
  • (2020)Evaluation of Smart Infrastructure Systems and Novel UV-Oriented Solution for Integration, Resilience, Inclusiveness, and Sustainability2020 5th International Conference on Universal Village (UV)10.1109/UV50937.2020.9426194(1-45)Online publication date: 24-Oct-2020
  • (2013)Continuous multi-modal human interest detection for a domestic companion humanoid robot2013 16th International Conference on Advanced Robotics (ICAR)10.1109/ICAR.2013.6766469(1-6)Online publication date: Nov-2013
  • (2010)WallBotsProceedings of the 8th ACM Conference on Designing Interactive Systems10.1145/1858171.1858208(208-217)Online publication date: 16-Aug-2010
  • (2010)Audiovisual Information Fusion in Human–Computer Interfaces and Intelligent Environments: A SurveyProceedings of the IEEE10.1109/JPROC.2010.205723198:10(1692-1715)Online publication date: Oct-2010
  • (2010)Analysis and implementation of dip based online control and monitoring system for displacement in control valves2010 INTERNATIONAL CONFERENCE ON COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES10.1109/ICCCCT.2010.5670546(171-176)Online publication date: Oct-2010
  • (2010)Increased Performace of FPGA-Based Color Classification SystemProceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2010.50(29-32)Online publication date: 2-May-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media