Abstract mosaic of dots.

Separating the Privileged Wheat from the Chaff – Using Text Analytics and Machine Learning to Protect Attorney-Client Privilege

By Nathaniel Huber-Fliflet, Jianping Zhang

April 15, 2019

This piece was authored in conjunction with Rishi P. Chhatwal, Assistant Vice President and Senior Legal Counsel for Enterprise eDiscovery, AT&T Services, Inc., and Robert Keeling, Partner, Sidley Austin, LLP.

Abstract

The digital age has created unique challenges for parties that engage in large-scale litigation. Safeguarding the attorney-client privilege is a critical task for litigators during discovery—one that becomes more difficult and expensive every year. Document review is now responsible for the vast majority of costs in the average legal matter, and costs are only rising. The volume of digitally-stored data doubles roughly every two years, driving up discovery costs and increasing the risk of inadvertent disclosure of privileged information. As the digital world evolves, the legal community has sought to evolve with it, particularly in the document review process.

Keyword searching has been the dominant method of identifying digitally-stored, privileged documents for the last several decades, but attorneys have conducted little research about the most efficient ways to use this method. Most legal teams rely on a combination of intuition and conventional wisdom. To subject those intuitions to the rigor of scientific experiments, we used three data sets and search term lists from real legal matters to determine which search terms were effective in identifying privileged communications. The results from our study revealed that thoughtfully crafted keyword term lists do identify a significant portion of the privileged document population. What may be surprising to experienced practitioners is that many commonly used terms that are believed to be imprecise proved quite effective at identifying privileged documents, while limiting the volume for review. Other popular terms proved to be ineffective. The study also compared the effectiveness of identifying privileged communications using predictive modeling and machine learning. The insights provided in this article can, if implemented by practitioners, add additional client protections against the disclosure of privilege documents and make privilege review more defensible and less costly.

To read the full publication, please visit: https://bit.ly/2GuPagX

Attribution

©2019. Published in Richmond Journal of Law & Technology, Vol. XXV, Issue 3. Reproduced with permission. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or stored in an electronic database or retrieval system without the express written consent of the American Bar Association or the copyright holder.