Entity Extraction

The foundation of natural language processing technologies is the automatic extraction and categorization of entities from textual documents, such as personal names and locations.

let's talk

What is Entity Extraction?

Entity extraction provides a solution for the automated recognition and grouping of entities in text documents, which also serves as a basis for additional procedures such as sentiment analysis, topic extraction, or other techniques related to natural language processing (NLP). In the process, different entity types (personal names, organizations, events, places, dates, and other main and subtypes) are extracted from the text bodies.

What is advanced name matcing?

Matching names of people, locations, and organizations are impeded by numerous obstacles. Misspellings, typos, aliases, nicknames, initials, and names in different languages all hamper and complicate the process. Advanced name recognition and matching offers a solution to these problems.

Importance of entity extraction

Extracting and grouping entities in textual content is of paramount importance for many areas. Entity recognition facilitates the implementation of PR, HR and marketing tasks, and also plays an important role in due diligence, intelligence and forecasting processes. Extraction of entities supports also the realization of advanced and sophisticated enterprise search.

Entity extraction and enterprise search

Entity extraction supports the effective usage of search engines, as TAS Enterprise Search, developed by Precognox.
The extracted entity groups may function as facets (filtering options) to narrow down the list of the results. In addition, the entities recognized in the text content can be shown grouped by entity types by the result.

Fine-tuning of entity tags

Automatically extracted entities can function as tags. TAS Tagger tagging solution developed by Precognox, in addition to its many functions, enables the fine-tuning of these extracted entity tags.

Advanced entity extracting solution integrated

The collaboration between US-based Babel Street and Precognox dates back years. As the official Hungarian reseller and product integrator of Babel Street’s Rosette text analytics platform in Hungary, as a matter of course our company also applies these solutions in own products. Babel Street’s entity extracting solution, Rosette Entity Extractor (REX) is integrated in TAS Tagger developed by Precognox.

In close cooperation

In addition to being the official system integrator and reseller of the Rosette API, as part of the collaboration Precognox has also participated in the development of Babel Street’s text analytics solutions.

Rosette Entity Extractor (REX)

Entities (e.g., organizations, people, places, products, dates) are significant parts of texts. Babel Street’s entity extraction solution, Rosette Entity Extractor (REX) is built on a flexible hybrid of statistical or deep neural network, exact match and pattern matching processors using these techniques in order to maximize the precision and recall for each entity type. Therefore the solution is extremely effective and is able to identify 29 entity types and more than 450 subtypes.

More accurate tagging and faster annotation

Rosette Adaptation Studio (RAS) is a user-friendly application designed for nontechnical users. In addition to entities extracted by REX, the intuitive interface allows the user to specify new and unique tag categories. The process can be done by the client itself, without the need for a data scientist or NLP expert. Using the application will accelerate the process and enables faster annotation with Rosette Adaptation Studio.

Learn more about how to achieve more precise entity extraction with Rosette Adaptation Studio.

Better together

Rosette Adaptation Studio is an excellent complementary tool to REX, which is now available free of charge to Rosette Entity Extractor users.

Professional support

Since most customers welcome guidance in selecting data, building a new model and evaluating results, Precognox, as a partner of Babel Street Technology, offer professional services for the training process*.

*if the customer has ordered Rosette solutions through Precognox

Trained by quality data

Rosette trains its models on a carefully curated corpus based on millions of news articles, social media content, and blog posts. The data is always annotated thoroughly by native speakers and the tags are cross-checked for consistency.
sample images, source: Rosette product page

Try it out

Would you like to test the knowledge and effectiveness of Rosette Entity Extractor? Try the free demo. Just type or copy some text in one of the available languages.

Product highlights

  • 21 prebuilt language models
  • 29 entity types and 450+ subtype available out-of-the-box
  • entity linking to knowledge bases
  • coreference resolution
  • hybrid of techniques, including deep learning models
  • confidence scores for each result
  • Cloud or enterprise deployments
  • fast and scalable
  • industrial-strength support
  • active development with a minimum of six updates per year

Technical specification

Availability and platform support