skip to main content
10.1145/1646396.1646431acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
poster

Web news categorization using a cross-media document graph

Published:08 July 2009Publication History

ABSTRACT

In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Microsoft Office document). For example, a Web news page is composed by text describing some event (e.g., a car accident) and a picture containing additional information regarding the real extent of the event (e.g., how damaged the car is) or providing evidence corroborating the text part. The framework handles multimedia information by considering not only the document's text and images data but also the layout structure which determines how a given text block is related to a particular image. The novelties and contributions of the proposed framework are: (1) support of heterogeneous types of multimedia documents; (2) a document-graph representation method; and (3) the computation of cross-media correlations. Moreover, we applied the framework to the tasks of categorising Web news feed data, and our results show a significant improvement over a single-medium based framework.

References

  1. A. Arasu and A. H. Garcia-Molina, "Extracting structured data from Web pages " in ACM SIGMOD conf. on management of data San Diego, California 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Barnard and D. A. Forsyth, "Learning the semantics of words and pictures," in Int'l Conf. on Computer Vision. vol. 2 Vancouver, Canada, 2001, pp. 408--415.Google ScholarGoogle Scholar
  3. A. Blum and T. Mitchell, "Combining labeled and unlabeled data with co-training," in Computational Learning Theory Madison, WI, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. M. Breuel, "Information extraction from HTML document by structural matching," in Int'l Workshop on Web Document Analysis Edinburgh, UK, 2003, pp. 11--14.Google ScholarGoogle Scholar
  5. M. L. Cascia, S. Sethi, and S. Sclaroff, "Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web," in IEEE Workshop on Content-based Access of Image and Video Libraries with the IEEE Conf. on Vision and Pattern Recognition Santa Barbara, California, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Crescenzi, G. Mecca, and P. Merialdo, "RoadRunner: towards automatic data extraction from large Web sites," in Int'l Conference on Very Large Data Bases, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A.-S. Dadzie, R. Bhagdev, A. Chakravarthy, S. Chapman, J. Iria, V. Lanfranchi, J. Magalhães, D. Petrelli, and F. Ciravegna, "Applying Semantic Web technologies to knowledge sharing in aerospace engineering," Journal of Industrial Manufacturing.Google ScholarGoogle Scholar
  8. L. Denoyer and P. Gallinari, "Bayesian network model for semi-structured document classification," Information Processing and Management, vol. 40, pp. 807--827, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Denoyer, P. Gallinari, J.-N. Vittaut, S. Brunesseaux, and S. Brunesseaux, "Structured multimedia document classification," in ACM DOCENG Grenoble, France, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ebadollahi, L. Xie, S.-F. Chang, and J. R. Smith, "Visual Event Detection Using Multi-Dimensional Concept Dynamics," in IEEE International Conference on Multimedia and Expo Toronto, Canada, 2006.Google ScholarGoogle Scholar
  11. Y. Feng and M. Lapata, "Automatic Image Annotation Using Auxiliary Text Information," in ACL HLT Columbus, Ohio, USA, 2008.Google ScholarGoogle Scholar
  12. A. Haubold and A. Natsev, "Web-based information content and its application to concept-based video retrieval," in ACM Conf. on Image and Video Retrieval Niagara Falls, Canada, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Howarth and S. Rüger, "Evaluation of texture features for content-based image retrieval," in Int'l Conf. on Image and Video Retrieval Dublin, Ireland, 2004, pp. 326--324.Google ScholarGoogle Scholar
  14. D. Joshi, M. Naphade, and A. Natsev, "Semantics reinforcement and fusion learning for multimedia streams," in ACM international conference on Image and video retrieval Amsterdam, The Netherlands, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. H. F. Laender, B. A. Ribeiro-Neto, A. S. d. Silva, and J. S. Teixeira, "A brief survey of Web data extraction tools," ACM SIGMOD Record, vol. 31 pp. 84--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Maderlechner and P. Suda, "Information extraction from document images using white space and graphics analysis," in Joint IAPR Int'l Workshop on Advances in Pattern Recognition, 1998, pp. 468--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Magalhães and S. Rüger, "Information-theoretic semantic multimedia indexing," in ACM Conf. on Image and Video Retrieval Amsterdam, The Netherlands, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Manning and H. Schütze, Foundations of statistical natural language processing. Cambridge, MA: MIT Press, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Rosenfeld, R. Feldman, and J. Aumann, "Structural extraction from visual layout of documents," in ACM Conf. on CIKM McLean, Virginia, USA 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Sebastiani, "Machine learning in automated text categorization," ACM Computing Surveys, vol. 34, pp. 1--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Shin and D. Doermann, "Classification of document page images based on visual similarity on layout structures," in SPIE Vol. 3967, Document Recognition and Retrieval VII San Jose, California, 2000, pp. 182--190.Google ScholarGoogle Scholar
  22. Y. Yang, "An evaluation of statistical approaches to text categorization," Information Retrieval, pp. 69--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma, "Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation," in World Wide Web Budapest, Hungary, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. S. Zhou, S. Zillner, M. Moeller, M. Sintek, Y. Zhan, A. Krishnan, and A. Gupta, "Semantics and CBIR: a medical imaging perspective," in ACM Conf. on Image and Video Retrieval Niagara Falls, Canada, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Zhuang, H. Shan, and F. Wu, "An approach for cross-media retrieval with cross-reference graph and PageRank," in International Conference on Multi-Media Modelling Beijing, China, 2006Google ScholarGoogle Scholar

Index Terms

  1. Web news categorization using a cross-media document graph

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIVR '09: Proceedings of the ACM International Conference on Image and Video Retrieval
        July 2009
        383 pages
        ISBN:9781605584805
        DOI:10.1145/1646396

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 July 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader