Organisations store, process and extract more data than ever before. There is a continued need to govern, secure and analyse big data as seen with the impending introduction of GDPR. This is of particular importance in times of crisis, whether it be an internal investigation into the misuse of company data, the theft of intellectual property or HR challenges. Moreover, the provision of electronic evidence to inform legal proceedings and regulatory intervention is increasing. Organisations often need to identify sources of relevant electronic evidence and to pinpoint to the information that is most pertinent to the matter at hand.

In the 1980s, when technology was first applied to support the review of electronic documents, date and keyword searching were considered the most efficient approaches to discover relevant information. However, following the exponential growth of data we need to question - are keyword searches still the best approach to finding relevant data across today’s vast databases?

Machine learning (known as Technology Assisted Review – TAR) leverages the knowledge of Subject Matter Experts (SMEs), applied to a relatively small number of documents and amplifies this across a much larger population of documents, to partly automate / prioritise the review process. TAR is increasingly becoming best practice in the review of documents – either as considered by the Courts; mandated by an organisation; or recommended by the law firm. When applied correctly, TAR provides a quick and cost effective searching methodology, whilst at the same time providing a higher quality work product than the traditional, manual and keyword approaches. Such clear benefits naturally appeal to the legal budget holder.

The TAR approach relies on two main factors: (i) determining the best ‘seed set’ of documents (i.e. those exemplar documents provided to ‘teach’ the technology) and (ii) the effectiveness of the algorithm/s. One key consideration is to identify the most relevant documents to act as the seed set for the TAR process; however, the identification of those key documents has historically been achieved by using keyword searches.

Given the variety and size of the potentially responsive data landscapes, keyword searching alone produces a disproportionate number of false positives, which could negate many of the benefits of the TAR approach. The isolated dependency on keyword searching across today’s evermore complex data landscape can no longer be relied upon as the definitive approach to an accurate, time and cost efficient review process.

So what’s the solution? Today’s Topic Modelling technologies allow us consider all of the words in our documents and datasets and rank them in order of significance, context and relevance from which intelligent decisions can be made.

Keywords are dead; long live keywords?

Keyword searching still plays a supporting role in today’s document and data review ecosystem, but the isolated dependency on keyword searching across today’s evermore complex ESI landscape can no longer be relied upon as the definitive approach to an accurate, time and cost efficient review process.

What can you do?

  • Ensure you are fully aware of the benefits and the risks of the variety of approaches that can be applied to the management and review of electronically stored information.
  • Stay abreast of new technologies that can support you to respond to incidents and disputes quickly. Deloitte holds regular technology showcase events that provide an update on the latest in Machine Learning / Artificial Intelligence to Litigation, Investigation and Regulatory matters. If you would like to attend one of our future events, or would like to learn more then please contact one of our specialists below.

Upcoming event: How are AI and ML changing the business of law?

Details: 5.00pm – 8.00pm on Thursday 14 June 2018.
Venue: Deloitte Digital, Clerkenwell, London

The effective use of Artificial Intelligence and Machine Learning is fundamentally changing the way in which lawyers are managing their business workflows.

At the event we will showcase a number of case studies which illustrate how the capture, review and production of all types of Electronically Stored Information are now providing significantly quicker, more cost effective and higher quality legal work product. We are pleased to be joined by guest speaker Tony Moss, Head of Discovery and Investigations at British American Tobacco (BAT).

To register your interest for the event please contact Karen D’Cruz.


Peter Robinson
+44 (0)20 7303 2148

Simon Placks
+44 (0)20 7303 2451


The comments to this entry are closed.