We did so, because we firmly believe its tag line: "Great teams are built of great people". This is how we want to build a great software development team!
You can relive the event by watching the presentations online: Stretch presentations
Yesterday, our computational linguist, Zoltan Varju, was the guest of the MicroData research group at the Central European University. Zoltan talked about how we work on data driven R&D projects at Precognox and attended the group’s regular lunch where members of the team shared their thoughts on on-going projects. We are working with the research group on kozbeszerzes.ceu.hu, the searchable database of Hungarian procurement data since last year and it was exciting to see how an academic research group works on data driven projects. You can find the slides of the presentation on our Hungarian blog.
"CEU MicroData is a team consisting of lecturers, PhD students and researchers of the Central European University. We analyse company and personal data for a better understanding of economical growth, international trade, company network, political connections and corruption.
Using our web application at kozbeszerzes.ceu.eu, we would like add more transparency to spending public moneys. Based on Open Data principles, we created an easy-to-search and browse, downloadable interface for the announcements of the - also open - Közbeszerzési Értesítő [Public Procurement Bulletin].
The archive of Közbeszerzési Értesítő - consisting of over 140,000 text bulletins between 1997 and 2013 - contained the information we needed - e.g. the announcer, the winner and the amount of the procurements - in a semi-structured form only. Moreover, the format of the bulletins kept changing from year to year. Being a small research group only, we could not process all the data ourselves. We were looking for a company that was able to build a structured database from text files, meeting all data-quality criteria in a short time-span. That is why we chose Precognox.
Precognox was already pleasant to cooperate with during the specification of the task and signing the contract. Following a personal needs analysis, together, we designed the schematics of the future database, the value and the method to monitor data quality. After rendering the documents into a uniform scheme we asked them to validate a number of data fields, i.e. amounts, dates, company names and addresses.
The product - shipped on deadline - exceeded our expectations. The accuracy of each data field was measured between 89 and 95 per cent, i.e. the value of correspondence between the ones entered by our researchers and the ones found and validated by the algorithm of Precognox - in a random pattern of a hundred items. We had never thought such accuracy was possible by automatic processing only.
They answered our further questions fast and flexibly, as a real, agile team. We would be happy to rely on their services in the future, too."
Miklós Koren: Teacher of the Central European University and a researcher of the Centre for Economic and Regional Studies at the Hungarian Academy of Sciences. His research field is economical growth and international trade. His scientific findings were published in leading international papers. His research on knowledge transfer is supported by the European Research Council. He co-edits defacto.io, an information site and blog on economics.
This week, kozbeszerzes.ceu.hu, a portal that makes searchable the Hungarian procurement data, has been launched. The site has been developed by CEU Microdata, a research group at the Department of Economics of the Central European University, lead by Miklos Koren and Adam Szeidl. Procurement data has been released in unstructured documents by the government, so it is extremely hard to get useful information from the texts. Precognox has developed a special text mining solution that extracts the relevant information from text files and stores them in a structured database which can be analyzed by researchers. Our company is very proud of the success of CEU Microdata. The site is simple and functional, and it is even robot friendly, so one can automatically harvest procurement data using kozbeszerzes.ceu.hu. CEU Microdata is a very active research group and they will produce other open datasets in the near future. Congratulations guys, we are eagerly waiting for your new results!