ABSTRACT
Abstract Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in document review for discovery in legal proceedings. Our comparison addresses a central question in the deployment of technology-assisted review: Should training documents be selected at random, or should they be selected using one or more non-random methods, such as keyword search or active learning? On eight review tasks -- four derived from the TREC 2009 Legal Track and four derived from actual legal matters -- recall was measured as a function of human review effort. The results show that entirely non-random training methods, in which the initial training documents are selected using a simple keyword search, and subsequent training documents are selected by active learning, require substantially and significantly less human review effort (P<0.01) to achieve any given level of recall, than passive learning, in which the machine-learning algorithm plays no role in the selection of training documents. Among passive-learning methods, significantly less human review effort (P<0.01) is required when keywords are used instead of random sampling to select the initial training documents. Among active-learning methods, continuous active learning with relevance feedback yields generally superior results to simple active learning with uncertainty sampling, while avoiding the vexing issue of "stabilization" -- determining when training is adequate, and therefore may stop.
- Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, S.D.N.Y., 2012.Google Scholar
- Case Management Order: Protocol Relating to the Production of Electronically Stored Information ("ESI"), In Re: Actos (Pioglitazone) Products Liability Litigation, MDL No. 6:11-md-2299, W.D. La., July 27, 2012.Google Scholar
- M. Bagdouri, W. Webber, D. D. Lewis, and D. W. Oard. Towards minimizing the annotation cost of certified text classification. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 989--998, 2013. Google ScholarDigital Library
- P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A. de Vries, and E. Yilmaz. Relevance assessment: are judges exchangeable and does it matter? In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 667--674, 2008. Google ScholarDigital Library
- S. Büttcher, C. L. A. Clarke, and G. V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010. Google ScholarDigital Library
- J. Cheng, A. Jones, C. Privault, and J.-M. Renders. Soft labeling for multi-pass document review. ICAIL 2013 DESI V Workshop, 2013.Google Scholar
- G. V. Cormack and M. Mojdeh. Machine learning for information retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. The Eighteenth Text REtrieval Conference (TREC 2009), 2009.Google Scholar
- M. R. Grossman and G. V. Cormack. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Richmond Journal of Law and Technology, 17(3):1--48, 2011.Google Scholar
- M. R. Grossman and G. V. Cormack. Inconsistent responsiveness determination in document review: Difference of opinion or human error? Pace Law Review, 32(2):267--288, 2012.Google Scholar
- M. R. Grossman and G. V. Cormack. The Grossman-Cormack glossary of technology-assisted review with foreword by John M. Facciola, U.S. Magistrate Judge. Federal Courts Law Review, 7(1):1--34, 2013.Google Scholar
- M. R. Grossman and G. V. Cormack. Comments on "The Implications of Rule 26(g) on the Use of Technology-Assisted Review." Federal Courts Law Review, 1, to appear 2014.Google Scholar
- B. Hedin, S. Tomlinson, J. R. Baron, and D. W. Oard. Overview of the TREC 2009 Legal Track. The Eighteenth Text REtrieval Conference (TREC 2009), 2009.Google Scholar
- C. Hogan, J. Reinhart, D. Brassil, M. Gerber, S. Rugani, and T. Jade. H5 at TREC 2008 Legal Interactive: User modeling, assessment & measurement. The Seventeenth Text REtrieval Conference (TREC 2008), 2008.Google Scholar
- D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663--685, 1952.Google ScholarCross Ref
- D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3--12, 1994. Google ScholarDigital Library
- D. W. Oard and W. Webber. Information retrieval for e-discovery. Information Retrieval, 6(1):1--140, 2012.Google ScholarDigital Library
- Y. Ravid. System for Enhancing Expert-Based Computerized Analysis of a Set of Digital Documents and Methods Useful in Conjunction Therewith. United States Patent 8527523, 2013.Google Scholar
- H. L. Roitblat, A. Kershaw, and P. Oot. Document categorization in legal electronic discovery: Computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70--80, 2010. Google ScholarDigital Library
- K. Schieneman and T. Gricks. The implications of Rule 26(g) on the use of technology-assisted review. Federal Courts Law Review, 7(1):239--274, 2013.Google Scholar
- J. C. Scholtes, T. van Cann, and M. Mack. The impact of incorrect training sets and rolling collections on technology-assisted review. ICAIL 2013 DESI V Workshop, 2013.Google Scholar
- F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47, 2002. Google ScholarDigital Library
- B. Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.Google ScholarDigital Library
- M. D. Smucker and C. P. Jethani. Human performance and retrieval precision revisited. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 595--602, 2010. Google ScholarDigital Library
- S. Tomlinson. Learning Task experiments in the TREC 2010 Legal Track. The Nineteenth Text REtrieval Conference (TREC 2010), 2010.Google Scholar
- E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36(5):697--716, 2000. Google ScholarDigital Library
- W. Webber, D. W. Oard, F. Scholer, and B. Hedin. Assessor error in stratified evaluation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pages 623--632, 2010. Google ScholarDigital Library
- W. Webber and J. Pickens. Assessor disagreement and text classifier accuracy. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 929--932, 2013. Google ScholarDigital Library
- C. Yablon and N. Landsman-Roos. Predictive coding: Emerging questions and concerns. South Carolina Law Review, 64(3):633--765, 2013.Google Scholar
Index Terms
- Evaluation of machine-learning protocols for technology-assisted review in electronic discovery
Recommendations
Scalability of Continuous Active Learning for Reliable High-Recall Text Classification
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementFor finite document collections, continuous active learning ('CAL') has been observed to achieve high recall with high probability, at a labeling cost asymptotically proportional to the number of relevant documents. As the size of the collection ...
Engineering Quality and Reliability in Technology-Assisted Review
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalThe objective of technology-assisted review ("TAR") is to find as much relevant information as possible with reasonable effort. Quality is a measure of the extent to which a TAR method achieves this objective, while reliability is a measure of how ...
Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalContinuous active learning achieves high recall for technology-assisted review, not only for an overall information need, but also for various facets of that information need, whether explicit or implicit. Through simulations using Cormack and Grossman'...
Comments