TAS Data Collector

TAS Data Collector enables collecting all unstructured and structured data from domains available on the Internet. The collected data can be used either in raw form or can be utilized with the help of additional services of the TAS Text Analytics System.

What is TAS Data Collector?

By TAS Data Collector the user can download unstructured data (textual content) from the Internet by structuring the content, making it accessible to other information systems, and suitable for further processing, analysis or visualization.
The content collected by the TAS Data Collector can be utilized immediately or can serve as a basis for text analysis workflows that can be implemented with other build-in modules of the TAS Platform.

Data collection workflow

  • data (textual content) of webpages (or subassemblies) specified by the customer are collected by the service
  • further steps (data cleaning, data enrichment, validation) are implemented under the supervision of our specialists
  • as a result, a structured database is created that can be used for further data processing (analysis, visualization) or serve as a basis for further text analytics solutions
  • providing and transferring the collected, properly formatted content to the customer (even through an authenticated, password protected channel)

Contact us

Would you like to learn more about TAS Platform? Write us or send a message using the contact form at the bottom of this page.

Andrea Kruk-Papp
Sales Assistant

Features of the TAS Data Collector

  • TAS Data Collector is able to extract the visible data, metadata (tags, picture description) or pagination from a website.
  • Sites, subpages, login-required pages, even hierarchical sites or pages with a slideshow component or with multilingual content also cause no problem for TAS Data Collector.
  • When data is recognized as hidden, we offer a screenshot solution (the original exact look of the data).
  • In some cases it is forbidden by robots.txt to collect data. We respect this; however, this data is also possible to collect.
  • We can extract texts from a lot of different documents and image formats (PDF, spreadsheet, diagram or image file formats).
  • We are prepared to produce and deliver any required output format, even ones that require software development.

Important! Please consider that we are not responsible for the further utilization of the collected data.

World-class data collection

Content from the Internet can also form part of a company’s data assets or may be the basis for world-class projects such as DIGIWHIST, which deals with public procurement data. Precognox’s solution for collecting this kind of web content is TAS Data Collector.

What can the collected content be used for?

  • research and development projects
  • new content and publications
  • service, information, thematic sites, blogs, public interest and open data portals
  • analyzes, statistics, visualizations
  • enterprise processes / operations, data backup
  • competitor and media monitoring
  • searchable databases
  • artificial intelligence, machine learning processes
  • data change monitoring

Appearance of TAS Data Collector

The TAS Data Collector GUI provides the ability to monitor the downloading stream. The appearance of the interface matches the corporate identity of the TAS Platform.

Reaching the goal of data collection

Collecting data is rarely a standalone process, the main goal is mostly to attain the comprehensive searchability throughout the whole company data assets. Learn more about how to get from successful enterprise data collection to intelligent search.

The interface provides information about:

  • resources overview: which are wired, how many records are received
  • the number of valid and broken records
  • overview of the total number of records
  • the date of the data collection

Learn more about TAS Data Collector and read the related Use Case.

Contact us

Would you like to learn more about TAS Data Collector or additional solutions of TAS Platform? Write us or send a message using the contact form below!

Andrea Kruk-Papp
Sales Assistant