You need to use data in your software solution which:
is available only on independent and very different websites or in other data sources (eg. in articles, in PDF or other document formats);
is in unstructured format - needs Natural Language Processing (NLP) tools and/or manpower to extract real information in high quality;
is full of unclean data with errors which need to be detected and corrected;
needs continuous updates (since the data sources are changing without previous notice)
Some of the problems are inevitably present from the beginning but you may face others only after having used the data for a certain amount of time. Therefore, you need a solution which solves all of these issues, not just the technical ones.
TAS is a common interface that is easy to use where you can manage all of your text
mining projects. You can see the status of your data sources in one place and it is possible to manage and use them as effortlessly as a simple SQL table.
TAS is vendor agnostic, so we can use your preferred linguistic tools (e.g. IBM Watson, KConnect tools) and text processing frameworks (e.g. GATE, UIMA), or should you prefer, we choose the most appropriate ones for you (Rosette API, OpenNLP, Stanford NLP tools etc.).
We deliver the collected textual data either as raw data or through a searchable API with the help of Solr based Precognox Search.
Our holistic approach lets you focus on what you want to use the collected data for. We are continuously tackling all of the technical and quality issues behind the scenes. We have a first rate QA team in support and we develop the necessary test and evaluation strategy for you in order to provide high quality and clean data.
We collected and integrated all the necessary people and processes to successfully address the problems of text mining: annotators and software development teams, Natural Language Processing and Artificial Intelligence experts , Data Mining processes and project management.
We have collected and made job postings searchable from hundreds of sites for a job search engine.
We have collected and made unique design items searchable for Limeset.
We have collected and cleaned public procurement data from various sources for several clients.
We are looking forward to assisting you! Please give us a short note of what you regularly waste precious time on.