Software IP and File Hashing for Obscurity

IT Group is regularly instructed in intellectual property (IP) disputes where files of varying types (source code, business blueprints, financial records etc) are at the centre of the dispute.

In the case of IP theft, part of the investigation process will often involve the forensic analysis of suspects’ devices to gain insight into their activity leading up to the alleged theft.

The problem that can often arise here is the suspects’ unwillingness to cooperate as they may have other artefacts on their device (corporate server or laptop) that they claim are valuable IP, personal information, or trade secrets and are unrelated to the dispute.

Picture this scenario. An employee leaves and moves to a new company and takes copies of his/her source code development files. The employee’s personal laptop will need to be investigated by an independent expert to establish what has been stolen. However, the employee has been developing other software for his new company that is said to be unrelated to the dispute at hand. The employee (and his new company) are concerned that their own IP rights might be breached if the expert reviewing their source files has sight of the unrelated files.

Where this is the case, each of the claimant’s and defendant’s files can be individually ‘hashed’ without the files actually leaving their present location. This process generates a unique value (often called a forensic fingerprint) that only a document with identical contents would have. The fingerprint does not take into account the name of the file, or any metadata changes (created, last accessed etc.) and the unique value does not disclose the original content. A hash is likely to look something like ‘b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9’.

Then only the 64 digit hashes of all files need to be taken from site and compared with the list of hashes produced from the original, allegedly stolen source code files. If any of the hashes match then these files are forensically identical and warrant further examination as to why both parties have the exact same files. The investigation up to this point can be completed without the examiner or expert gaining access to the contents of the data.

IT Group recently used this same process to hash over 750,000 source code files in two competitors’ respective software code repositories. This revealed that nearly 20,000 files matched each other, confirming that they had been copied from the other when a developer moved organisations.