OCR technology and text analytics for modern business administration

The challenge

Although the digital transformation is moving at breakneck speeds, in many places it is a significant challenge to process the paper-based documents that have accumulated over the decades. And by processing we mean not only the digitization of these documents, but also their automatic sorting and all the text analytical processes that make it possible to search for documents in the future. And our client was looking for just such a solution, which required combining the knowledge of several fields.

OCR technology

Learn more about optical character recognition, its role and benefits.

The implementation

The first step was to digitize the customer’s millions of pages of paper-based documents. Prior to the scanning process, the documents were provided with barcodes, which were used for sorting during digitization, using the appropriate ABBYY solution, which can disassemble scanned documents based on barcodes. This was followed by optical character recognition implemented with ABBYY OCR, which identifies the textual content of digitized documents.
After textual content extraction, the entities and other necessary metadata were extracted by the self-developed TAS Tagger. After the documents have been indexed, they can be searched in the TAS Enterprise Search interface, so that the original scanned version can be accessed and viewed by clicking the dedicated button added for results, pointing to the originated content.

Partnership with allied knowledge

In addition to our text analytics experience, the services of ABBYY, the capabilities of which we could certainly rely on as ABBYY’s reseller partners, were essential to implement a solution that resonated with the customer’s ideas. Our customer was completely satisfied with the solution created with this combined knowledge, which met the expectations and technical standards of today.

Optikai karakterfelismerés
optical character recognition technology can be used in many areas

Cross-sectoral solution

Optical Character Recognition (OCR) can provide a practical solution for modernizing business data processing in many areas. Not only for paper-based document management, but also in cases where these documents are only available in digitized (scanned) image format (e.g. as email attachments).
As many sectors still use paper-based management for certain processes, the solution developed could be of great help to them. Which areas could these be? Among others, those that handle official documents, as claims management and legal or public administration.

Exit the digital space and back

Despite the continuous digitalization and IT developments, we are confronted every day with the fact that paper-based official documents are still an integral part of public administration, which, having left the digital space, are delivered by post (ordinary and registered letters, recorded delivery). These often need to be returned to the sender with our original signature, which in some cases can be done by email or uploaded via an administrative portal (attaching the signed and scanned document), but the “digitized response” will not be fully suitable for subsequent processing, as it will not be searchable. In such cases, the administrator will have to search the “attachments” one by one. It is easy to see that, if these documents are processed using OCR technology, they will be fully returned to the digital space. After that, applying the appropriate enterprise search solution, the required information is promptly available. The result of this process is a modern way of handling the paper-based document management.

Of course, there are many other areas where optical character recognition and text analytics technology is doubtlessly beneficial. You can learn more about the topic from our blog article on Forget the paper-based search.