LegalTech: eDiscovery Explained and Analytics Analysed

The volume of electronically stored information (‘ESI’) kept by businesses and individuals is growing exponentially as storage costs plummet, while the types of ESI and velocities at which they are exchanged over networks increase prolifically (See eg Global Yellow Pages Limited v. Promedia Directories Pte Ltd and another suit [2013] SGHC 111, para 1). As Justice Lee of the High Court of Singapore opined, “[t]he sheer volume of electronic information, as well as the difficulty of accessing some types of electronic information, presents considerable practical challenges in the area of discovery in litigation.” These developments have triggered great demand for sophisticated technologies and techniques to facilitate complex and large-scale document disclosures in litigation, arbitration, and internal and regulatory investigations (“eDiscovery”) today.

Litigants in Hong Kong, Singapore, and several other common law jurisdictions are subject to court rules concerning eDiscovery. For example, Practice Direction SL1.2 of the High Court of Hong Kong requires parties and their legal representatives in certain Commercial List and other actions to discuss eDiscovery before the first case management conference, including, where appropriate, “the tools and techniques (if any) which should be considered to reduce the burden and cost of discovery of Electronic Documents” and “the preservation of Electronic Documents”. Compliance with these rules require parties to understand eDiscovery technologies and techniques, and to employ them in reasonable and proportional ways.

In common law and civil law jurisdictions, parties increasingly employ eDiscovery to conduct complex, large-scale document disclosure projects. Again, the parties could only do so in cost-effective and forensically-sound manner, while managing legal and operational risks suitably, if they understand the pertinent issues of running eDiscovery projects. This article provides an overview of eDiscovery project considerations and current technologies and techniques.

Planning for Review and Production

A key objective of eDiscovery is to identify and produce non-privileged documents that are responsive to a subpoena, document disclosure request, or internal investigative needs, and to withhold documents that are non-responsive and/or are legally privileged. Case teams should take time to consider carefully the legal, operational, and technological requirements of document review and production. Relevant questions include: Is the production of native files required? Must image files containing text be rendered searchable, and is this even feasible? Is the preservation of metadata required?

ESI collected in breach of forensic principles or improperly processed and analysed could waste substantial costs and time, especially when the problems are discovered much later (resulting in the need to repeat eDiscovery procedures). In a worse-case scenario, the case team could miss production deadlines, inadvertently produce privileged documents, or withhold relevant documents that should have been disclosed.

Traditional Techniques and Tools

When the project commences, ESI must be collected, processed, and hosted in a well-organised database. Hardcopies should be converted into ESI by scanning and made text-searchable using optical character recognition software. Processing involves aggregating and unitising diverse types of ESI (eg emails, chat messages, Word documents, PDFs, and audio files), rendering them structured and searchable data. Processing tools could also cull out duplicative and corrupted files, and files containing computer-generated content. After creating the pool of documents for manual review, but before the review actually begins, the pool could be split into different batches and work streams, prioritised, and further refined in scope to enhance efficiency and ultimately reduce review costs.

Pools of potentially responsive and non-responsive documents are created by filtering documents based on their custodians and time of creation and modification, among other attributes, and by applying keyword searches using combinations of keywords likely to be responsive to particular legal and factual issues. Documents can be categorised by language to help the case team staff reviewers with appropriate language skills and to manage review work streams. Documents can be reviewed contextually with greater consistency and speed by grouping documents with over 50 percent similarity, and by grouping emails within the same thread of communication, including branches of these threads. Language detection, near-duplicate analysis, and email threading are tools commonly packaged with eDiscovery review platforms available today, which help to accelerate and automate certain review procedures.

Automating and Accelerating Review with Analytics

Traditional techniques and tools have their limitations. In searching by keywords, reviewers are essentially guessing the words that authors of responsive documents have used. But, these keyword searches yield both under-inclusive results (eg documents are omitted because the reviewer is unaware of certain variations and regionalisms adopted by the author, and there might be misspellings) and over-inclusive results (eg keywords picking up non-responsive documents because the words were used in contexts that are likely irrelevant). For example, documents containing “apple” could relate to both the company and the fruit with the same name, but only the company might be responsive to the case.

Latest analytics software can sharpen the search results and minimise false positives and negatives. The software could group documents that have been self-identified, categorised, and conceptualised by certain repetitive patterns of words contained therein –patterns that traditional Boolean strings and fuzzy searches would have difficulty identifying. Reviewers could show documents already manually identified to be responsive to the software as samples to find “conceptually” similar documents. Likewise, reviews could “teach” the software using abstracts and excerpts of responsive text passages.

Several kinds of advance eDiscovery software provide “predictive coding” capabilities, which Australian, English, Irish, and U.S. courts have recognised as tools appropriate in discoveries involving significant ESI volumes (See eg Pyrrho Investments Ltd & Another v. MWB Property Limited & Others [2016] EWHC 256 (Ch); McConnell Dowell Constructors (Aust) Pty Ltd v. Santam Ltd (No 1) [2016] VSC 734.). Using machine-learning algorithms, based on the analytics framework discussed above, predictive coding software identifies and prioritises documents responsive to the case. Manual review of document samples are used to train the software to recognise textual patterns or “concepts”. The software is trained sufficiently after several rounds of manual review and adequate sample sizes are reviewed. The software will then rank the documents according to their responsiveness to certain “concepts”. Reviewers may then manually check/review highly-ranked documents and de-prioritise or even ignore lowly-ranked documents.

Through a combination of traditional and advance eDiscovery technologies and techniques, case teams could conduct large-scale ESI review and production, while ensuring the quality and consistency of the analysis, minimising risks of error, and managing costs and timelines efficiently.


Co-Founder & Chief Operating Officer, DHB Global (Hong Kong, CHN)

Sebastian is a lawtech and regtech expert. He was formerly senior legal counsel and Asia regional head of e-discovery review at a global leading legal technology solutions company. Previously, he practised financial regulatory law and commercial dispute resolution in international firms. He is a community organiser of lawtech and techlaw events, including the first Access to Justice Hackathon in Asia. He is a member of the InnoTech Committee of the Law Society of Hong Kong. He holds degrees in science and law, including the Bachelor of Civil Law (Oxon), and is legally qualified in Hong Kong, New York and at the U.S. Supreme Court.

Manager, Epiq Systems

Mr. Yuen is a manager of Epiq’s document review services in Asia, focusing on the Greater China market and supporting eDiscovery projects regionally, and is admitted to practice in California (USA). Epiq serves law firms, corporations, financial institutions and government agencies with 30 locations worldwide.