skip to main content
10.1145/1101149.1101290acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

SEVA: sensor-enhanced video annotation

Published: 06 November 2005 Publication History

Abstract

In this paper, we study how a sensor-rich world can be exploited by digital recording devices such as cameras and camcorders to improve a user's ability to search through a large repository of image and video files. We design and implement a digital recording system that records identities and locations of objects (as advertised by their sensors) along with visual images (as recorded by a camera). The process, which we refer to as sensor-enhanced video annotation (SEVA), combines a series of correlation, interpolation, and extrapolation techniques. It produces a tagged stream that later can be used to efficiently search for videos or frames containing particular objects or people. We present detailed experiments with a prototype of our system using both stationary and mobile objects as well as GPS and ultrasound. Our experiments show that: (i) SEVA has zero error rates for static objects, except very close to the boundary of the viewable area; (ii) for moving objects or a moving camera, SEVA only misses objects leaving or entering the viewable area by 1-2 frames; (iii) SEVA can scale to 10 fast moving objects using current sensor technology; and (iv) SEVA runs online using relatively inexpensive hardware.

References

[1]
K. Aizama, D. Tancharoen, S. Kawasaki, and T. Yamasaki. Efficient retrieval of life log based on context and content. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), New York, NY, pages 22--31, October 2004.
[2]
R. Bajaj, S. L. Ranaweera, and D. P. Agrawal. Gps: Location-tracking technology. Computer, 35(4):92--94, March 2002.
[3]
Crossbow technology inc. http://www.xbow.com.
[4]
M. Davis, S. King, N. Good, and R. Sarvas. From context to content: Leveraging context to infer media metadata. In Proceedings of the 12th annual ACM International Conference on Multimedia (MM'04), New York, NY, pages 188--195, October 2004.
[5]
Deluo gps waas. http://www.deluoelectronics.com/.
[6]
J. L. Devore. Probability and Statistics for Engineering and the Sciences. Brooks/Cole, fifth edition, 1999.
[7]
D. P. W. Ellis and K. Lee. Minimal-impact audio-based personal archives. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), New York, NY, pages 39--47, October 2004.
[8]
J. Fan, Y. Gao, and H. Luo. Multi-level annotation of natural scenes using dominant image components and semantic concepts. In Proceedings of the 12th annual ACM International Conference on Multimedia (MM'04), New York, NY, pages 540--547, October 2004.
[9]
H. Feng, R. Shi, and T. Chua. A bootstrapping framework for annotating and retrieving www images. In Proceedings of the 12th annual ACM International Conference on Multimedia (MM'04), New York, NY, pages 960--967, October 2004.
[10]
Ffmpeg 0.4.8. http://ffmpeg.sourceforge.net/index.php.
[11]
K. Finkenzeller. RFID Handbook: Fundamentals and Applications in Contactless Smart Cards and Identification. John Willey & Sons, second edition, 2003.
[12]
Galileo. http://en.wikipedia.org/wiki/Galileo_positioning_system.
[13]
J. Gemmell, G. Bell, R. Lueder, S. Drucker, and C. Wong. Mylifebits: Fulfilling the memex vision. In Proceedings of the 10th annual ACM International Conference on Multimedia (MM'02), Juan Les Pins, France, pages 235--238, December 2002.
[14]
J. Gemmell, L. Williams, K. Wood, R. Lueder, and G. Bell. Passive capture and ensuing issues for a personal lifetime store. In Proceedings of the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experience (CARPE'04), New York, NY, pages 48--55, October 2004.
[15]
find the latitude and longitude of any us address. http://www.geocoder.us.
[16]
Why modernize gps? http://www.gps.oma.be/gb/modern_gb_ok_css.htm.
[17]
Gpsdrive 2.09. http://www.gpsdrive.cc/.
[18]
R. Grimm. System support for pervasive applications. PhD thesis, University of Washington, Department of Computer Science and Engineering, December 2002.
[19]
J. Hightower and G. Borriello. Location systems for ubiquitous computing. Computer, 34(8):57--66, August 2001.
[20]
J. Hightower, R. Want, and G. Borriello. Spoton: An indoor 3d location sensing technology based on rf signal strength. Technical Report 00-02-02, University of Washington, 2000.
[21]
J. Hill and D. Culler. Mica: a wireless platform for deeply embedded networks. IEEE Micro, 22(6):1224, November/December 2002.
[22]
R. Jin, J. Y. Chai, and L. Si. Effective automatic image annotation via a coherent language model and active learning. In Proceedings of the 12th annual ACM International Conference on Multimedia (MM'04), New York, NY, pages 892--899, October 2004.
[23]
B. Johanson, A. Fox, and T. Winograd. The interactive workspaces project: Experiences with ubiquitous computing rooms. IEEE Pervasive Computing, 1(2), 2002.
[24]
T. Kindberg and et. al. People, places, things: Web presence for the real world. Mobile Networks, 7(5), October 2002.
[25]
B. Li and K. Goh. Confidence-based dynamic ensemble for image annotation and semantics discovery. In Proceedings of the 11th annual ACM International Conference on Multimedia (MM'03), Berkeley, CA, pages 195--206, November 2003.
[26]
D. Lymberopoulos and A. Savvides. XYZ: A motion-enabled, power aware sensor node platform for distributed sensor network applications. In Proceedings of Information Processing in Sensor Networks (ISPN), Los Angeles, CA, April 2005.
[27]
A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, and J. Anderson. Wireless sensor networks for habitat monitoring. In Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications (WSNA'02), Atlanta, GA, pages 88--97, September 2002.
[28]
B. S. Manjunath, P. Salembier, and T. Sikora. Introduction to MPEG 7: Multimedia Content Description Language. John Wiley & Sons, firth edition, 2002.
[29]
M. Naaman, A. Paepcke, and H. Garcia-Molina. From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proceedings of the 10th International Conference on Cooperative Information Systems (CoopIS'03), Catania, Sicily, pages 196--217, November 2003.
[30]
F. Nack and W. Putz. Designing annotation before it's needed. In Proceedings of the 9th annual ACM International Conference on Multimedia (MM'01), Ottawa, Canada, pages 251--260, September 2001.
[31]
L. M. Ni, Y. Liu, Y. C. Lau, and A. P. Patil. Landmarc: Indoor location sensing using active rfid. In Proceedings of the 1st IEEE International Conference on Pervasive Computing and Communications (PerCom'03), Dallas-Fort Worth, TX, pages 407--417, March 2003.
[32]
J. Polastre, R. Szewczyk, and D. Culler. Telos: Enabling ultra-low power wireless research. In Proceedings of the 4th International Conference on Information Processing in Sensor Networks: Special track on Platform Tools and Design Methods for Network Embedded Sensors (IPSN/SPOTS), April 2005.
[33]
N. B. Priyantha, A. Chakraborty, and H. Balakrishnan. The cricket location-support system. In Proceedings of the 6th annual ACM International Conference on Mobile Computing and Networking (MobiCom'00), Boston, MA, pages 32--43, August 2000.
[34]
M. Roman, C. Hess, and R. Campbell. Gaia: An oo middleware infrastructure for ubiquitous computing environments. In ECOOP Workshop on Object-Orientation and Operating Systems, Malaga, Spain, June 2002.
[35]
A. Smith, H. Balakrishnan, M. Goraczko, and N. Priyantha. Tracking moving devices with the cricket location system. In Proceedings of the 2nd ACM International Conference on Mobile Systems, Applications, and Services (MobiSys'04), Boston, MA, pages 190--202, June 2004.
[36]
N. M. Su, H. Park, E. Bostrom, J. Burke, M. B. Srivastava, and D. Estrin. Augmemting film and video footage with sensor data. In Proceedings of the 2nd IEEE Annual Conference on Pervasive Computing and Communications (PerComm'04), Orlando, FL, pages 3--12, March 2004.
[37]
K. Toyama, R. Logan, and A. Roseway. Geographic location tags on digital images. In Proceedings of the 11th annual ACM International Conference on Multimedia (MM'03), Berkeley, CA, pages 156--166, November 2003.
[38]
L. Zhang, Y. Hu, M. Li, W. Ma, and H. Zhang. Effective propagation for face annotation in family albums. In Proceedings of the 12th annual ACM International Conference on Multimedia (MM'04), New York, NY, pages 716--723, October 2004.

Cited By

View all
  • (2020)Movie Tags Prediction and Segmentation Using Deep LearningIEEE Access10.1109/ACCESS.2019.29635358(6071-6086)Online publication date: 2020
  • (2018)Information Management and Target Searching in Massive Urban Video Based on Video-GIS2018 8th International Conference on Electronics Information and Emergency Communication (ICEIEC)10.1109/ICEIEC.2018.8473519(228-232)Online publication date: Jun-2018
  • (2017)Movies tags extraction using deep learning2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)10.1109/AVSS.2017.8078459(1-6)Online publication date: Aug-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
November 2005
1110 pages
ISBN:1595930442
DOI:10.1145/1101149
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. context-based retrieval
  2. location-based services
  3. sensor-enhanced
  4. video annotation

Qualifiers

  • Article

Conference

MM05

Acceptance Rates

MULTIMEDIA '05 Paper Acceptance Rate 49 of 312 submissions, 16%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Movie Tags Prediction and Segmentation Using Deep LearningIEEE Access10.1109/ACCESS.2019.29635358(6071-6086)Online publication date: 2020
  • (2018)Information Management and Target Searching in Massive Urban Video Based on Video-GIS2018 8th International Conference on Electronics Information and Emergency Communication (ICEIEC)10.1109/ICEIEC.2018.8473519(228-232)Online publication date: Jun-2018
  • (2017)Movies tags extraction using deep learning2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)10.1109/AVSS.2017.8078459(1-6)Online publication date: Aug-2017
  • (2015)Content vs. ContextACM Transactions on Multimedia Computing, Communications, and Applications10.1145/270028711:3(1-21)Online publication date: 5-Feb-2015
  • (2014)MediaQProceedings of the 5th ACM Multimedia Systems Conference10.1145/2557642.2578223(224-235)Online publication date: 19-Mar-2014
  • (2014)Video Retrieval Methods Using Geographic Information in Windows Azure Cloud2014 IEEE International Conference on Data Mining Workshop10.1109/ICDMW.2014.135(1113-1119)Online publication date: Dec-2014
  • (2014)Large-scale geo-tagged video indexing and queriesGeoinformatica10.1007/s10707-013-0199-618:4(671-697)Online publication date: 1-Oct-2014
  • (2013)Client-Side Relevance Feedback Approach for Image Retrieval in Mobile EnvironmentMultimedia Data Engineering Applications and Processing10.4018/978-1-4666-2940-0.ch010(193-204)Online publication date: 2013
  • (2012)Client-Side Relevance Feedback Approach for Image Retrieval in Mobile EnvironmentWireless Technologies10.4018/978-1-61350-101-6.ch314(724-736)Online publication date: 2012
  • (2012)HUGVidProceedings of the 20th International Conference on Advances in Geographic Information Systems10.1145/2424321.2424362(319-328)Online publication date: 6-Nov-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media