6 Essential e-Disclosure Search Techniques

One of the most commonly asked questions in e-Disclosure is: “what different search techniques should be used?” The question is a particularly relevant one, as Question 7 in CPR Practice Direction 31B (Electronic Documents Questionnaire) specifically asks which automated searches/techniques other than keyword searches should be used as part of the e-Disclosure process.

We know this can be a daunting question if you haven’t had much experience, but never fear, we’ve set out a few of our favourite e-Disclosure search techniques below to help make it simple.

1. Boolean Search

Starting off with the easy stuff, a Boolean Search is one that utilises AND, OR and NOT operators, which respectively work as follows:

  • John AND Johnson – would return results which contain both ‘John’ and ‘Johnson’.

  • John OR Johnson – would return results which contain either ‘John’ or ‘Johnson’ or both.

  • John NOT Johnson – would return results which contain the word ‘John’ but would exclude all results which contain ‘Johnson’.

Boolean searching is a simple method which can quickly narrow the scope of documents which require searching, and is particularly useful when searching for words which are used in day-to-day business, but which are critical to the case. For example if you were searching through emails for evidence of some sort of fraud in relation to invoices, it might seem straightforward enough simply to list ‘invoice’ as a keyword. This may work, but usually it ends up bringing up far too many search results (usually in the hundreds of thousands) to ever be able to conduct a proportionate disclosure process.

However, it may be possible to establish certain patterns to reduce this, for example if you know that you do not need to search any of the invoices themselves (but merely any data which discusses invoicing), you could search ‘invoice NOT number’ or something similar, which would search any document containing the word ‘invoice’ but exclude any document which has an invoice number.

2. Grouping

Grouping effectively allows you to have greater control over how your Booloean Searches behave. For example, ‘(desktop OR server) AND application’ retrieves all items that contain ‘desktop’ or ‘server’ or both, as well as the term ‘application.’ This is particularly useful where there is a specific set of terms which need to be found together, but which may vary some of those terms in certain situations.

3. Wildcard Search

Wildcard Searches allow you to search keywords which may have variations or suffer from misspellings. For example:

  • Refer?nce (single character Wildcard Search) – would return any documents containing the words ‘reference’ as well as the misspelled variation ‘referance’.

  • Referenc* (multiple character wildcard search) – would return results including the words ‘reference’, ‘referenced’ and/or ‘referencing’.

Wildcard Searches can be extremely useful where a word is subject to variation for any number of reasons. It allows a user to search more efficiently by avoiding the need to think of and type out every single variation of a complex name or often misspelled word.

This not only reduces the overall time (and therefore cost) spent on the disclosure process, but also ensures greater accuracy by reducing the chance of human error.

4. Fuzzy Logic

Fuzzy Logic serves a similar purpose to Wildcard searching, but works in a different way in that it searches for entire words which are similar. For example:

  • The search term ‘home~’ could return ‘come’, ‘homes’, ‘foam’, etc.

The level of variation can be set by the user by way of percentage difference by simply placing a number from 0.1-0.9 after the ~ symbol, with 0.1 allowing the least variation. This can be incredibly useful when dealing with words that are often misspelled in a more than one way, and offers the same benefits as Wildcard Searches above in terms of time and cost savings.

5. Phrase Search

A Phrase Search (unsurprisingly) allows a user to review results which contain an exact phrase. For example:

  • “‘John eats ham” – would only return results which match that phrase exactly. It would not return any results which contain those words alone, separated or in the wrong order.

This is obviously critical functionality where there is an allegation that something has been specifically said, or where a certain phrase is likely to provide key evidence.

6. Proximity Search

Proximity Searches give the user extra control in relation to how Phrase Searches behave by allowing a user to search for a number of terms that appear within a given distance (measured in number of words) from each other. For example:

  • ‘“John eat ham”~3’ would match “John eat ham” (words in between: 0), “John always likes to eatham” (words in between: 3), “John will eat ham” (words in between: 1), “eat ham John” (words in between: 0) and so on.

  • ‘“John eat ham”~3’ would not match “John likes to eat all my ham” (words in between: 4), etc.