ABSTRACT
The availability of large volumes of granted patents and applications, all publicly available on the Web, enables the use of sophisticated text mining and information retrieval methods to facilitate access and analysis of patents. In this paper we investigate techniques to automatically recommend patents given a query patent. This task is critical for a variety of patent-related analysis problems such as finding relevant citations, research of relevant prior art, and infringement analysis. We investigate the use of latent Dirichlet allocation and Dirichlet multinomial regression to represent patent documents and to compute similarity scores. We compare our methods with state-of-the-art document representations and retrieval techniques and demonstrate the effectiveness of our approach on a collection of US patent publications.
- K. H. Atkinson. Toward a more rational patent search paradigm. In Proceedings of the 1st ACM Workshop on Patent Information Retrieval, pages 37--40, 2008. Google ScholarDigital Library
- L. Azzopardi, W. Vanderbauwhede, and H. Joho. Search system requirements of patent analysts. In SIGIR'10, pages 775--776. ACM, 2010. Google ScholarDigital Library
- M. Bailey, B. Lanham, and J. Leibowitz. Mechanized searching in the U.S. Patent Office. Journal of the Patent Office Society, 35(7):566--587, 1953.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, January 2003. Google ScholarDigital Library
- D. Bonino, A. Ciaramella, and F. Corno. Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Inf., 32(1):30--38, 2010.Google ScholarCross Ref
- W. B. Croft and J. Lafferty. Language Modeling for Information Retrieval. Kluwer Academic Publishers, Norwell, MA, USA, 2003. Google ScholarDigital Library
- A. Fujii. Enhancing patent retrieval by citation analysis. In SIGIR'07, pages 793--794. ACM, 2007. Google ScholarDigital Library
- S. Fujita. Technology survey and invalidity search: A comparative study of different tasks for japanese patent document retrieval. Inf. Process. Manage., 43(5):1154--1172, September 2007. Google ScholarDigital Library
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228--5235, April 2004.Google ScholarCross Ref
- R. Krestel and P. Fankhauser. Personalized topic-based tag recommendation. Neurocomputing, 76(1):61--70, 2012. Google ScholarDigital Library
- R. J. Mann and M. Underweiser. A new look at patent quality: Relating patent prosecution to validity. Journal of Empirical Legal Studies, 9(1):1--32, 2012.Google ScholarCross Ref
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK, July 2008. Google ScholarDigital Library
- H. Mase, T. Matsubayashi, Y. Ogawa, M. Iwayama, and T. Oshio. Proposal of two-stage patent retrieval method considering the claim structure. TALIP, 4(2):190--206, June 2005. Google ScholarDigital Library
- D. M. Mimno and A. McCallum. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In UAI'08, pages 411--418. AUAI Press, 2008.Google Scholar
- F. Saad and A. Nürnberger. Overview of prior-art cross-lingual information retrieval approaches. World Patent Inf., 34(4):304--314, December.Google ScholarCross Ref
- X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In SIGIR'06, pages 178--185. ACM, 2006. Google ScholarDigital Library
- X. Xue and W. B. Croft. Automatic query generation for patent search. In CIKM'09, pages 2037--2040. ACM, 2009. Google ScholarDigital Library
- X. Yi and J. Allan. A comparative study of utilizing topic models for information retrieval. In ECIR'09, pages 29--41. Springer-Verlag, 2009. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. Trans. Inf. Syst., 22(2):179--214, 2004. Google ScholarDigital Library
Index Terms
Recommending patents based on latent topics
Recommendations
Generating contextualized sentiment lexica based on latent topics and user ratings
HT '13: Proceedings of the 24th ACM Conference on Hypertext and Social MediaSentiment lexica are useful for analyzing opinions in Web collections, for domain-dependent sentiment classification, and as sub-components of recommender systems. In this paper, we present a strategy for automatically generating topic-dependent lexica ...
A Latent Dirichlet Framework for Relevance Modeling
AIRS '09: Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval TechnologyRelevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are ...
Transforming patents into prior-art queries
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrievalSearching for prior-art patents is an essential step for the patent examiner to validate or invalidate a patent application. In this paper, we consider the whole patent as the query, which reduces the burden on the user, and also makes many more ...
Comments