One of the most asked questions by clients is: “what different search techniques should be used during eDiscovery?”
We know this can be a daunting question if you haven’t had much experience, so we’ve set out a few simple eDiscovery search techniques below:
Starting off with the easy stuff, a Boolean search is one that utilises AND, OR and NOT operators, which respectively work as follows:
- John AND Johnson – would return results which contain both ‘John’ and ‘Johnson’.
- John OR Johnson – would return results which contain either ‘John’ or ‘Johnson’ or both.
- John NOT Johnson – would return results which contain the word ‘John’ but would exclude all results which contain ‘Johnson’. Note: using a Hyphen will have the same effect.
Boolean searching is a simple method which can quickly narrow the scope of documents which require searching and is particularly useful when searching for words which are used in day-to-day business, but which are critical to the case. For example, if you were searching through emails for evidence of some sort of fraud in relation to invoices, it might seem straightforward enough simply to list ‘invoice’ as a keyword. This may work, but unfortunately, it is more likely to bring up far too many search results to ever be able to conduct a proportionate review.
It may be possible to establish certain patterns to reduce this, for example, if you know that you do not need to search any of the invoices themselves (but merely any data which discusses invoicing), you could search ‘invoice NOT number’ or something similar, which would search any document containing the word ‘invoice’ but exclude any document which has an invoice number.
Grouping effectively allows you to have greater control over how your Boolean searches behave. For example, “(desktop OR server) AND application” retrieves all items that contain “desktop” or “server” or both, as well as the term “application”. This is particularly useful where there is a specific set of terms which need to be found together, but which may vary some of those terms in certain situations.
Wildcard searches allow you to search keywords which may have variations or suffer from misspellings, which happen to the best of us. For example:
- Refer?nce (single character Wildcard search) – would return any documents containing the words ‘reference’ as well as the misspelled variation ‘referance’.
- Referenc* (multiple character wildcard search) – would return results including the words ‘reference’, ‘referenced’ and/or ‘referencing’.
Wildcard searches can be extremely useful where a word is subject to variation for any number of reasons. It allows a user to search more efficiently by avoiding the need to think of and type out every single variation of a complex name or often misspelled word.
This not only reduces the overall time (and therefore cost) spent on the disclosure process but also ensures greater accuracy by reducing the chance of human error.
Straying into the slightly more complicated now, ‘Fuzzy Logic’ serves a similar purpose to Wildcard searching but works in a different way in that it searches for entire words which are similar. For example:
- The search term ‘home~’ could return ‘come’, ‘homes’, ‘foam’, etc.
The level of variation can be set by the user by way of percentage difference by simply placing a number from 0.1-0.9 after the ~ symbol, with 0.1 allowing the least variation. This can be incredibly useful when dealing with words that are often misspelled in more than one way and offers the same benefits as Wildcard searches above in terms of time and cost savings.
A Phrase search (unsurprisingly) allows a user to review results which contain an exact phrase. For example:
- “John eats ham” – would only return results which match that phrase exactly. It would not return any results which contain those words alone, separated or in the wrong order.
This is obviously critical functionality where there is an allegation that something has been specifically said, or where a certain phrase is likely to provide key evidence.
Proximity searches give the user extra control in relation to how Phrase searches behave by allowing a user to search for a number of terms that appear within a given distance (measured in number of words) from each other. For example:
- ‘“John eat ham”~3’ would match “John eat ham” (words in between: 0), “John always likes to eat ham” (words in between: 3), “John will eat ham” (words in between: 1), “eat ham John” (words in between: 0) and so on.
- ‘“John eat ham”~3’ would not match “John likes to eat all my ham” (words in between: 4), etc.