• Semantic search
  • text mining
  • agile software development

Quality matters: Battle tested solutions

Thu 8 October, 2015

We love software, especially if it is safe, reliable, and does its job. Our philosophy of design is simple: know your product. So we are using SCRUM/Agile methodology, rigorous testing and we evaluate every machine learning solution.

Some part of our team


Engineering is a curious mixture of science and craft. Our team has almost thirty members, some of them with 8-10 years of experience. The team structure is flat, junior stuff is fully integrated into the work under the guidance of their senior peers. We are a Java shop and our engineers are keen to learn about the Java ecosystem, from the low-level details to designing architectures. As craftsmen, we know that even the cutting-edge technology can be used in the wrong way. So, we introduced the SCRUM methodology. We work closely with our clients, this means short iterations that provides possibilities for feedback and thinking about the further directions during the development.

Quality matters

Using SCRUM methodology and test-driven development reduces the possibilities of bugs though it can’t eliminate them. We have a separated Quality Assurance Team which main purpose is to exhaustively test every product made by our software development team. When it is required we give testing to independent testers. Also, our team can be hired for doing independent tests.


In the era of big data, you can’t avoid machine learning (ML) applications. These often require labeled data for supervised learning tasks. Our evaluation team has got experienced annotators who can prepare labeled data for training and testing. We see evaluation as part of the quality assurance and testing process, hence we don’t sell products without an evaluation report of its ML parts.
There is no software without bugs. But it does matter where those bugs lie! SCRUM methodology, quality assurance and evaluation helps us to avoid the critical ones.


Automatic Detection of Emotions in Text

Wed 30 September, 2015

Our research is the first attempt to offer a solution for detecting emotions in Hungarian texts. In general, emotion analysis is mostly popular in behavioral sciences and psychology, however, in the recent years it also started to spread in the field of NLP (Natural Language Processing).

Plutchik wheel of human emotions

The background

It is important to make a distinction between the widely used sentiment analysis and emotion analysis. Emotion analysis aims to extract emotional states from a given text. Detecting emotions is extremely hard, they come and go so quickly and they are usually associated with extra-linguistic clues such as facial expressions, tone and etc.

In the Internet Era, it is becoming more and more important to analyze and extract emotions from texts, not just because it is uniquely fascinating and challenging to NLP experts but also because it is becoming strikingly important in the field of economy, if for example we would like to measure customer satisfaction.

Our research group hypothesizes that words with emotional meaning or content should be the best markers of the speaker’s/writer’s emotional intent, so we have constructed a Hungarian Emotion Dictionary. The dictionary consists of sub-dictionaries, each based on Ekman's six basic emotions, namely sadness, anger, disgust, fear, surprise and joy. Our team manually annotated several blog posts and their comments to test the efficiency of using our dictionaries for emotion analysis.

How can we use emotion analysis?

During the local elections in 2014, we analyzed Hungarian tweets related mayoral candidates in Budapest. We found that anger is the best predictor of winning! We were surprised, since most studies (like this classic from Bollen et al. found number of mentions and/or positive sentiment the best factors of success. The number of Hungarian Twitter users is very small, and less fine-grained solutions like sentiment analysis or the frequency of mentions could give us a bad picture since most of the tweets were neutral, and mentions of small party candidates were very rare.  So, we analyzed tweets by our emotion dictionaries and gave each candidate an emotion score that reflects the relative proportion of each emotion in tweets mentioning him/her. From the six basic emotions, it was the mean square error of anger which were in accord with the results of opinion polls and later the final outcome of the election.

The Economist’s R-word index is one of the most well-known indicator of the economy. It is so simple, as it depicts the frequency of the term “recession” in the Wall Street Journal and in the Financial Times, yet it is mostly accurate. We created a corpus, or a collection of articles from various news sites and blogs. We found no correlation between the frequency of “recession” and its Hungarian synonyms and the GDP. However, the level of fear and anger are usually increasing before the GDP starts to decline.


Mon 21 September, 2015

A closeup from the Silicon valley

The Kaposvar HQ of the Precognox company hosted a new meetup in Wednesday afternoon. Our guest was Dénes Finduk, the editor of http://siliconvalleylife.blog.hu and Data Engeneer Lead of Addepar Company seating in the Silicon Valley. We learned from him how he achieved to accomplish his studies with Master degree in the University of Edinburgh and later how he started his career in the US. During the interactive meetup Dénes was bombarded with questions and he answered them with his friendly and opened manner. From his report we took a glimpse of his everyday life in California which mostly consists of work and work and work. Well, we’ve seen from his photos that sometimes there is room for fun activities in the lives of our colleagues overseas as well. Long and familiar discussion closed the meetup in the evening with the obligatory pizza dinner.


We have a new NLP member

Tue 7 July, 2015

Our former trainee, Kitti Balogh, joins our research team as a full-time member.

Kitti has just graduated in Eötvös University with an MSc degree from Statistics and her thesis "The application of latent Dirichlet allocation for Social Sciences" received the prize of best Survey Statistics Master's Thesis of the academic year 2014/15.

Congratulations Kitti, we are so proud of you!

New KConnect search services give healthcare the very best in medical information

Wed 15 April, 2015

KConnect launched its official website: www.kconnect.eu and begins the commercialisation of new multi-lingual medical text analysis and search services. Precognox is a proud partner of the team.

The new state-of-the-art medical information search services have the ability to empower healthcare and life science professionals and the public alike. The search services can provide the fastest and most relevant medical support information available from which users can make the best-informed decisions. 

The intelligent (semantic) search services can incorporate both published medical literature and in-house medical information sources (such as electronic health records or health registries).

"The quality of the search performance can help clinicians and researchers remain at the forefront of their profession. By having the right knowledge about best practices and treatments at their fingertips, clinicians can ensure the very best in patient outcomes and a healthier community," says Professor Robert Stewart, Department of Psychological Medicine, King's College London.

Intelligent search for better user experience

The search services have been made 'intelligent' by understanding the meaning/context/intent of user queries. The very best in medical information is made more findable by the fact that the semantic search is not just based on query keywords but also on related concepts and contexts.

The user search box has the ability to understand keyword connotations, related concepts and their relationships within a medical context. Such machine comprehension is also employed in the 'reading' (indexing, classifying and annotating) of medical content so that the most relevant information can be found even if a user's chosen keyword happens to be absent within the text.


Search global medical information in any language

The accurate language mapping of key medical concepts allows users to search in their own language (currently there are several European languages available with more to follow). The addition of machine translation means that information can be provided either in English or the source's original language.

Building blocks for tailored medical services

Individually created components and toolkits mean that an organisation can tailor its search-driven medical solutions according to its own requirements. There are several tailoring options available including information sources, access (cloud or local installation), language, security, functionality (alerts, recommendations and social search) and whether the created solution is either standalone or embedded.

Partnership opportunities

Due to the expected demand for its services, KConnect is looking to extend its Professional Service Community by looking for new partners to help with the quick and wider adoption of its services.

The KConnect Consortium are:

- Vienna University of Technology (Austria);
- Findwise AB (Sweden);
- Precognox Kft (Hungary);
- Ontotext AD (Bulgaria);
- Trip Database Ltd (UK);
- Health on the Net Foundation (Switzerland);
- Qulturum, Region Jönköping County (Sweden);
- King's College London (UK);
- University of Sheffield (UK) - GATE;
- Charles University, Prague (Czech Republic).


Syndicate content