From successful enterprise data collection to intelligent search

The success of a business is mostly determined by the efficiency of the processes within the company. Most of all, proper use of corporate data assets has a huge importance, so the availability of up-to-date information must be ensured at the highest level. To achieve this, a complex system must be implemented that covers the entire process from data collection to enterprise search. How is such a system structured?

Enterprise data collection, the basis of the process

Enterprise data collection refers to the collection of corporate data assets. Where can company data come from? It can be files stored in the cloud or in a document management system, emails, data in ERP and CRM systems, or even information available on the web. They can all form part of the corporate data estate, regardless of the format in which they are available. Collecting all this data is the first step to making it accessible and searchable. Evidently, this data collection process is repetitive, the frequency of which can be adapted to business needs. In addition, several built-in controlling methods ensure that the process is carried out without any loss.

World-class data collection

Content from the Internet can also form part of a company’s data assets or may be the basis for world-class projects such as DIGIWHIST, which deals with public procurement data. Precognox’s solution for collecting this kind of web content is TAS Data Collector.

Complementary processes

Inherent to data collection are processes that are essential to achieve the desired result, such as data cleaning or validation. Data cleansing is the process of transforming unstructured data into structured data. This makes the data easily searchable and filterable. The process also includes the filtering out of errors and duplications in the data collection, and the standardisation of different date formats. During validation, we can indicate which records can be processed and which should be discarded, based on a set of rules specific to the data.

Search engine

Search engines specialised in enterprise search, such as TAS Enterprise Search developed by Precognox, are built on open source solutions such as Elasticsearch. Elasticsearch is a free and open search and analysis engine for all types of data, including texts, numerical, spatial, structured and unstructured data. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, speed and scalability, Elasticsearch is the core of the Elastic Stack, a set of free and open tools for entering, enriching, storing, analyzing and visualizing data (ELK Stack – Elasticsearch, Logstash and Kibana).

https://www.youtube.com/watch?v=XJeLcgz9GUw

Intelligent search

In order to call a search process truly intelligent, it is necessary to integrate a number of advanced solutions from the fields of machine learning, artificial intelligence, linguistics and natural language processing.

Entity recognition and extraction, advanced name matching, search log analysis and the use of thesaurus dictionary all contribute to making enterprise search as efficient as possible and ensuring the highest quality of service that users expect.