Evaluation of machine-learning protocols for technology-assisted review in electronic discovery

Authors:
Gordon V. Cormack

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Maura R. Grossman

Wachtell, Lipton, Rosen & Katz, New York, NY, USA

Wachtell, Lipton, Rosen & Katz, New York, NY, USA
View Profile

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalJuly 2014Pages 153–162https://doi.org/10.1145/2600428.2609601

Published:03 July 2014Publication History

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Pages 153–162

ABSTRACT

Abstract Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in document review for discovery in legal proceedings. Our comparison addresses a central question in the deployment of technology-assisted review: Should training documents be selected at random, or should they be selected using one or more non-random methods, such as keyword search or active learning? On eight review tasks -- four derived from the TREC 2009 Legal Track and four derived from actual legal matters -- recall was measured as a function of human review effort. The results show that entirely non-random training methods, in which the initial training documents are selected using a simple keyword search, and subsequent training documents are selected by active learning, require substantially and significantly less human review effort (P<0.01) to achieve any given level of recall, than passive learning, in which the machine-learning algorithm plays no role in the selection of training documents. Among passive-learning methods, significantly less human review effort (P<0.01) is required when keywords are used instead of random sampling to select the initial training documents. Among active-learning methods, continuous active learning with relevance feedback yields generally superior results to simple active learning with uncertainty sampling, while avoiding the vexing issue of "stabilization" -- determining when training is adequate, and therefore may stop.

References

Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, S.D.N.Y., 2012.Google Scholar
Case Management Order: Protocol Relating to the Production of Electronically Stored Information ("ESI"), In Re: Actos (Pioglitazone) Products Liability Litigation, MDL No. 6:11-md-2299, W.D. La., July 27, 2012.Google Scholar
M. Bagdouri, W. Webber, D. D. Lewis, and D. W. Oard. Towards minimizing the annotation cost of certified text classification. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 989--998, 2013. Google ScholarDigital Library
P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A. de Vries, and E. Yilmaz. Relevance assessment: are judges exchangeable and does it matter? In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 667--674, 2008. Google ScholarDigital Library
S. Büttcher, C. L. A. Clarke, and G. V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010. Google ScholarDigital Library
J. Cheng, A. Jones, C. Privault, and J.-M. Renders. Soft labeling for multi-pass document review. ICAIL 2013 DESI V Workshop, 2013.Google Scholar
G. V. Cormack and M. Mojdeh. Machine learning for information retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. The Eighteenth Text REtrieval Conference (TREC 2009), 2009.Google Scholar
M. R. Grossman and G. V. Cormack. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Richmond Journal of Law and Technology, 17(3):1--48, 2011.Google Scholar
M. R. Grossman and G. V. Cormack. Inconsistent responsiveness determination in document review: Difference of opinion or human error? Pace Law Review, 32(2):267--288, 2012.Google Scholar
M. R. Grossman and G. V. Cormack. The Grossman-Cormack glossary of technology-assisted review with foreword by John M. Facciola, U.S. Magistrate Judge. Federal Courts Law Review, 7(1):1--34, 2013.Google Scholar
M. R. Grossman and G. V. Cormack. Comments on "The Implications of Rule 26(g) on the Use of Technology-Assisted Review." Federal Courts Law Review, 1, to appear 2014.Google Scholar
B. Hedin, S. Tomlinson, J. R. Baron, and D. W. Oard. Overview of the TREC 2009 Legal Track. The Eighteenth Text REtrieval Conference (TREC 2009), 2009.Google Scholar
C. Hogan, J. Reinhart, D. Brassil, M. Gerber, S. Rugani, and T. Jade. H5 at TREC 2008 Legal Interactive: User modeling, assessment & measurement. The Seventeenth Text REtrieval Conference (TREC 2008), 2008.Google Scholar
D. G. Horvitz and D. J. Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663--685, 1952.Google ScholarCross Ref
D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3--12, 1994. Google ScholarDigital Library
D. W. Oard and W. Webber. Information retrieval for e-discovery. Information Retrieval, 6(1):1--140, 2012.Google ScholarDigital Library
Y. Ravid. System for Enhancing Expert-Based Computerized Analysis of a Set of Digital Documents and Methods Useful in Conjunction Therewith. United States Patent 8527523, 2013.Google Scholar
H. L. Roitblat, A. Kershaw, and P. Oot. Document categorization in legal electronic discovery: Computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1):70--80, 2010. Google ScholarDigital Library
K. Schieneman and T. Gricks. The implications of Rule 26(g) on the use of technology-assisted review. Federal Courts Law Review, 7(1):239--274, 2013.Google Scholar
J. C. Scholtes, T. van Cann, and M. Mack. The impact of incorrect training sets and rolling collections on technology-assisted review. ICAIL 2013 DESI V Workshop, 2013.Google Scholar
F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47, 2002. Google ScholarDigital Library
B. Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.Google ScholarDigital Library
M. D. Smucker and C. P. Jethani. Human performance and retrieval precision revisited. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 595--602, 2010. Google ScholarDigital Library
S. Tomlinson. Learning Task experiments in the TREC 2010 Legal Track. The Nineteenth Text REtrieval Conference (TREC 2010), 2010.Google Scholar
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36(5):697--716, 2000. Google ScholarDigital Library
W. Webber, D. W. Oard, F. Scholer, and B. Hedin. Assessor error in stratified evaluation. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pages 623--632, 2010. Google ScholarDigital Library
W. Webber and J. Pickens. Assessor disagreement and text classifier accuracy. In Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 929--932, 2013. Google ScholarDigital Library
C. Yablon and N. Landsman-Roos. Predictive coding: Emerging questions and concerns. South Carolina Law Review, 64(3):633--765, 2013.Google Scholar

Index Terms

Evaluation of machine-learning protocols for technology-assisted review in electronic discovery
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Relevance assessment
    2. Information retrieval query processing

Recommendations

Scalability of Continuous Active Learning for Reliable High-Recall Text Classification
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

For finite document collections, continuous active learning ('CAL') has been observed to achieve high recall with high probability, at a labeling cost asymptotically proportional to the number of relevant documents. As the size of the collection ...
Read More
Engineering Quality and Reliability in Technology-Assisted Review
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

The objective of technology-assisted review ("TAR") is to find as much relevant information as possible with reasonable effort. Quality is a measure of the extent to which a TAR method achieves this objective, while reliability is a measure of how ...
Read More
Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Continuous active learning achieves high recall for technology-assisted review, not only for an overall information need, but also for various facets of that information need, whether explicit or implicit. Through simulations using Cormack and Grossman'...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
July 2014
1330 pages
ISBN:9781450322577
DOI:10.1145/2600428
General Chairs:
Shlomo Geva
Queensland University of Technology
,
Andrew Trotman
University of Dunedin
,
Program Chairs:
Peter Bruza
Queensland University of Technology
,
Charles L.A. Clarke
University of Waterloo
,
Kal Järvelin
University of Tampere
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 July 2014
Check for updates
Author Tags
e-discovery
electronic discovery
predictive coding
technology-assisted review
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 3,892
  Total Downloads
- Downloads (Last 12 months)407
- Downloads (Last 6 weeks)53
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluation of machine-learning protocols for technology-assisted review in electronic discovery

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalability of Continuous Active Learning for Reliable High-Recall Text Classification

Engineering Quality and Reliability in Technology-Assisted Review

Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review