ABSTRACT
We report on the user requirements study and preliminary implementation phases in creating a digital library that indexes and retrieves educational materials on math. We first review the current approaches and resources for math retrieval, then report on the interviews of a small group of potential users to properly ascertain their needs. While preliminary, the results suggest that meta-search and resource categorization are two basic requirements for a math search engine. In addition, we implement a prototype categorization system and show that the generic features work well in identifying the math contents from the webpage but perform less well at categorizing them. We discuss our long term goals, where we plan to investigate how math expressions and text search may be best integrated.
- G. Attardi, A. Gullí, and F. Sebastiani. Automatic Web page categorization by link and context analysis. In C. Hutchison and G. Lanzarone, editors, Proceedings of THAI-99, European Symposium on Telematics, Hypermedia and Artificial Intelligence, pages 105--119, Varese, IT, 1999.Google Scholar
- N. J. Belkin, R. N. Oddy, and H. M. Brooks. Ask for information retrieval: Part i.: Background and theory. pages 299--304, 1997. Google ScholarDigital Library
- A. P. Bishop. Digital libraries and knowledge disaggregation: the use of journal article components. In DL '98: Proceedings of the third ACM conference on Digital Libraries, pages 29--39, New York, NY, USA, 1998. ACM Press. Google ScholarDigital Library
- A. Broder. A taxonomy of web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
- C. M. Brown. Information seeking behavior of scientists in the electronic information age: Astronomers, chemists, mathematicians, and physicists. Journal of the American Society for Information Science and Technology, 50(10):929--943, 1999. Google ScholarDigital Library
- G. Buchanan, S. J. Cunningham, A. Blandford, J. Rimmer, and C. Warwick. Information seeking by humanities scholars. In ECDL, pages 218--229, 2005. Google ScholarDigital Library
- D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In Fifth Asia Pacific Web Conference (APWeb2003), 2003. Google ScholarDigital Library
- D. O. Case. Looking for Information, Second Edition: A Survey of Research on Information Seeking, Needs, and Behavior (Library and Information Science). Academic Press, 2006.Google Scholar
- M. B. Eisenberg and R. E. Berkowitz. Information problem-solving: the Big Six Skills approach to library and information skills instruction. Norwood, NJ: Albex Publishing, 1990.Google Scholar
- X.-D. Gu, J. Chen, W.-Y. Ma, and G.-L. Chen. Visual based content understanding towards web adaptation. In AH '02: Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems, pages 164--173, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
- M. Hearst. Design recommendations for hierarchical faceted search interfaces. In ACM SIGIR Workshop on Faceted Search, 2006.Google Scholar
- P. Jipsen. Text-based input formats for mathematical formulas. In The Evolution of Mathematical Communication in the Age of Digital Libraries, IMA "Hot Topics" Workshop, U.S.A, 2006.Google Scholar
- F. Kamareddine, R. Lamar, M. Maarek, and J. B. Wells. Restoring natural language as a computerised mathematics input method. In Towards Mechanized Mathematical Assistants, MKM 2007, pages 280--295, 2007. Google ScholarDigital Library
- M. Kan, J. Klavans, and K. McKeown. Linear segmentation and segment significance. 1998.Google Scholar
- M. Kohlhase and A. Franke. MBase: Representing knowledge and context for the integration of mathematical software systems. Journal of Symbolic Computation, 32(4):365--402, 2001. Google ScholarDigital Library
- M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In Proceedings of Artificial Intelligence and Symbolic Computation, AISC 2006, number 4120 in LNAI, pages 241--253. Springer Verlag, 2006. Google ScholarDigital Library
- H. Kruger. Searching mathematics with zentralblatt math: Overview and outlook. In Enhancing the Searching of Mathematics, IMA "Hot Topics" Workshop, U.S.A, 2004.Google Scholar
- A. M. Lau. Advancing PARCELS: PARser for content extraction and logical structure using inter- and intra-similarity features. Technical report, National University of Singapore, 2005.Google Scholar
- C. H. Lee, M.--Y. Kan, and S. Lai. Stylistic and lexical co-training for web block classification. In Proceedings of WIDM 04, pages 136--143, Washington, D.C., USA, 2004. ACM Press. Google ScholarDigital Library
- Y.-B. Lee and S.-H. Myaeng. Text genre classification with genre-revealing and subject-revealing features. In SIGIR, pages 145--150, 2002. Google ScholarDigital Library
- P. Libbrecht and E. Melis. Methods to access and retrieve mathematical content in activemath. In ICMS, volume 4151 of Lecture Notes in Computer Science, pages 331--342. Springer, 2006. Google ScholarDigital Library
- R. Miner and R. Munavalli. An approach to mathematical search through query formulation and data normalization. In Towards Mechanized Mathematical Assistants, MKM 2007, pages 342--355, 2007. Google ScholarDigital Library
- G. Newby. Information space based on HTML structure. In The Ninth Text REtrieval Conference (TREC 9), pages 601---610, 2000.Google Scholar
- H. R. Tibbo. Primarily history: Historians and the search for primary source materials. In JCDL '02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital libraries, pages 1--10, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- S. Wiberley and W. G. Jones. Time and technology: A decade-long look at humanists' use of electronic information technology. College and Research Libraries, 61, September, pages 421--431, 2000.Google Scholar
Index Terms
- Math information retrieval: user requirements and prototype implementation
Recommendations
One Blade for One Purpose: Advancing Math Information Retrieval using Hybrid Search
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalNeural retrievers have been shown to be effective for math-aware search. Their ability to cope with math symbol mismatches, to represent highly contextualized semantics, and to learn effective representations are critical to improving math information ...
Challenges of Mathematical Information Retrievalin the NTCIR-11 Math Wikipedia Task
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalMathematical Information Retrieval concerns retrieving information related to a particular mathematical concept. The NTCIR-11 Math Task develops an evaluation test collection for document sections retrieval of scientific articles based on human ...
MIaS: Math-Aware Retrieval in Digital Mathematical Libraries
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementDigital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML contain mainly documents from STEM fields, where mathematical formulae are often more important than text for understanding. Conventional information retrieval (IR) systems are ...
Comments