ABSTRACT
This demo presents SmartPub, a novel web-based platform that supports the exploration and visualization of shallow meta-data (e.g., author list, keywords) and deep meta-data--long tail named entities which are rare, and often relevant only in specific knowledge domain--from scientific publications. The platform collects documents from different sources (e.g. DBLP and Arxiv), and extracts the domain-specific named entities from the text of the publications using Named Entity Recognizers (NERs) which we can train with minimal human supervision even for rare entity types. The platform further enables the interaction with the Crowd for filtering purposes or training data generation, and provides extended visualization and exploration capabilities. SmartPub will be demonstrated using sample collection of scientific publications focusing on the computer science domain and will address the entity types Dataset (i.e. dataset presented or used in a publication), and Methods (i.e. algorithms used to create/enrich/analyse a data set)
- A. Bozzon, P. Fraternali, L. Galli, and R. Karam. Modeling crowdsourcing scenarios in socially-enabled human computation applications. Journal on Data Semantics, 3(3):169--188, Sep 2014.Google ScholarCross Ref
- Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188--1196, 2014. Google ScholarDigital Library
- P. Lopez. GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications. In European Conference on Digital Library (ECDL), Corfu, Greece, 2009. Google ScholarDigital Library
- S. Mesbah, A. Bozzon, C. Lofi, and G.-J. Houben. Describing data processing pipelines in scientific publications for big data injection. In Proceedings of the 1st Workshop on Scholarly Web Mining, pages 1--8. ACM, 2017. Google ScholarDigital Library
- S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, and G.-J. Houben. Facet embeddings for explorative analytics in digital libraries. In International Conference on Theory and Practice of Digital Libraries, pages 86--99. Springer, 2017.Google ScholarCross Ref
- S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, and G.-J. Houben. Semantic annotation of data processing pipelines in scientific publications. In European Semantic Web Conference, pages 321--336. Springer, 2017.Google ScholarDigital Library
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013. Google ScholarDigital Library
- R. Reinanda, E. Meij, and M. de Rijke. Document filtering for long-tail entities. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 771--780. ACM, 2016. Google ScholarDigital Library
- C.-T. Tsai, G. Kundu, and D. Roth. Concept-based analysis of scientific literature. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pages 1733--1738. ACM, 2013. Google ScholarDigital Library
Index Terms
- SmartPub: A Platform for Long-Tail Entity Extraction from Scientific Publications
Recommendations
Automatic gazette creation for named entity recognition and application to resume processing
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologiesNamed entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. ...
NCBI disease corpus
Graphical abstractDisplay Omitted NCBI disease corpus is built as a gold-standard resource for disease recognition.793 PubMed abstracts are annotated with disease mentions and concepts (MeSH/OMIM).14 Annotators produced high consistency level and inter-...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Comments