IT Group Bolsters Document Management Services with Further Investment in Document OCR Technologies

IT Group has expanded its document processing and data extraction capability by investing in the latest document OCR technologies from leading solutions provider Aquaforest to ensure continued high-accuracy, high-volume OCRing of documents for our clients.

OCR, or Optical Character Recognition, is the process of scanning documents for text. For example, if a photograph or physical copy of a letter has been scanned into a system, the OCRing process scans the document, recognising any text within the file and makes it searchable.

During e-Disclosure exercises, the ability to search scanned documents is particularly important when keyword searching is applied across the full dataset, as non-OCRed documents that could meet the keyword criteria would not be included in the final results.

With its expanded arsenal of OCRing tools, IT group now has the ability to OCR more documents at once utilising a cluster of servers and intercommunicating software to output results faster than ever before. Alongside increased speed, IT Group is now able to OCR more document types including:

  • PDF
  • TIFF
  • JPG
  • BMP
  • PNG
  • GIF

OCRing capabilities are not just something of interest to those involved in e-Disclosure exercises.

IT Group’s digital forensics team has previously found key evidence in an employee misconduct case as a result of OCRing documentation. The team was presented with an employee’s computer who was accused of faking documentation. Initial analysis did not show any results of the documents being created or edited, based primarily on keyword searches that were known to be in the final document (believed to be fake).

The smoking gun was found after all the image files on the machine were put through the OCR process. This revealed a series of documents that had been created on another machine before being scanned into the machine under investigation, showing the trail of edits and changes to the key document.

In a separate case, another smoking gun was found after hundreds of thousands of image files were put through the OCR process. One of the images reported matches against a specific search term. It transpired that a rogue employee had photographed a series of design plans on his iPhone and then the iPhone had been backed up to his computer. On one of the photographs was a confidential part number which was only revealed as a result of the OCRing process.