Petr Knoth
Petr Knoth
Research Associate
Knowledge Media Institute
The Open University
Walton Hall
Milton Keynes, MK7 6AA
United Kingdom

Direct: ++44(0)1908 654548
web: my KMI page

Research interests

  • Computational Linguistics
  • Natural Language Processing
  • Information Retrieval
  • Automatic Link Generation
  • Information Extraction
  • Semantic Web


Knoth, P. (2013) CORE: Aggregation Use Cases for Open Access, Demo at Joint Conference on Digital Libraries (JCDL 2013), Indianapolis, Indiana, United States 

Knoth, P. (2013) From Open Access Metadata to Open Access Content: Two Principles for Increased Visibility of Open Access Content, Open Repositories 2013 (OR 2013), Charlottetown, Prince Edward Island, Canada 

Knoth, P. and Herrmannova, D. (2013) Simple Yet Effective Methods for Cross-Lingual Link Discovery (CLLD) - KMI @ NTCIR-10 CrossLink-2, NTCIR-10 Evaluation of Information Access Technologies, Tokyo, Japan 

Knoth, P. and Zdrahal, Z. (2012) CORE: Three Access Levels to Underpin Open Access, D-Lib Magazine, 18, 11/12, Corporation for National Research Initiatives 

Knoth, P., Zdrahal, Z. and Juffinger, A. (2012) Special Issue on Mining Scientific Publications, D-Lib Magazine, 18, 7/8, Corporation for National Research Initiatives  

Herrmannova, D. and Knoth, P. (2012) Visual Search for Supporting Content Exploration in Large Document Collections, D-Lib Magazine, 18, 7/8, Corporation for National Research Initiatives  

Knoth, P., Zilka, L. and Zdrahal, Z. (2011) KMI, The Open University at NTCIR-9 CrossLink: Cross-Lingual Link Discovery in Wikipedia Using Explicit Semantic Analysis, The 9th NTCIR Workshop Meeting Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-Lingual Information Access, Tokyo, Japan

Knoth, P., Zilka, L. and Zdrahal, Z. (2011) Using Explicit Semantic Analysis for Cross-Lingual Link Discovery, 5th International Workshop on Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies (CLIA) at The 5th International Joint Conference on Natural Language Processing (IJC-NLP 2011), Chiang Mai, Thailand

Maleshkova, M., Zilka, L., Knoth, P. and Pedrinaci, C. Cross-Lingual Web API Classification and Annotation, 2nd Workshop on the Multilingual Semantic Web at The 10th International Semantic Web Conference, Bonn, Germany.

Knoth, P. and Zdrahal, Z. (2011) Mining Cross-document Relationships from Text, The First International Conference on Advances in Information Mining and Management (IMMM 2011), Barcelona, Spain

Knoth, P., Robotka, V. and Zdrahal, Z. (2011) Connecting Repositories in the Open Access Domain using Text Mining and Semantic Data, International Conference on Theory and Practice of Digital Libraries 2011 (TPDL 2011), Berlin, Germany (Best Poster/Demo Award)

Knoth, P. and Zdrahal, Z. (2011) CORE: Connecting Repositories in the Open Access Domain, CERN workshop on Innovations in Scholarly Communication (OAI7), Geneva, Switzerland

Knoth, P., Novotny, J., and Zdrahal, Z. (2010) Automatic generation of inter-passage links based on semantic similarity, In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China

Fernandez, M., Sabou, M., Knoth, P., Motta, E. (2010) Predicting the quality of semantic relations by applying Machine Learning classifiers, In Proceedings of the 17th International Conference on Knowledge Engineering and Knowledge Management, Poster session. Lisbon, Portugal (Best Poster Award)

Knoth, P., Collins, T., Sklavounou, E., and Zdrahal, Z. (2010) Facilitating cross-language retrieval and machine translation by multilingual domain ontologies, In Workshop on Supporting eLearning with Language Resources and Semantic Data at LREC 2010, Valletta, Malta

Knoth, P., Collins, T., Sklavounou, E., and Zdrahal, Z. (2010) EUROGENE: Multilingual Retrieval and Machine Translation applied to Human Genetics, In 32nd European Conference on IR Research (ECIR 2010), Demo session, Milton Keynes, United Kingdom

Knoth, P., Sova, J., and Zdrahal, Z. (2010) Eurogene - The First Pan-European Learning Service in the Field of Genetics, Znalosti (Knowledge) 2010, Jindrichuv Hradec, Czech Republic

Knoth, P. (2009) Semantic Annotation of Multilingual Learning Objects Based on a Domain Ontology, In Doctoral consortium at EC-TEL 2009, Nice, France

Zdrahal, Z., Knoth, P., Collins, T., and Mulholland, P. (2009) Reasoning across Multilingual Learning Resources in Human Genetics, In International Conference on Interactive Computer Aided Learning (ICL 2009), Villach, Austria

Knoth, P., Schmidt, M., Smrz, P., and Zdrahal, Z. (2009) Towards a Framework for Comparing Automatic Term Recognition Methods, In Znalosti (Knowledge) 2009, Brno, Czech Republic

Schmidt, M., Knoth, P., and Smrz, P. (2009) Information Extraction in the KiWi Project, In Znalosti (Knowledge) 2009, Brno, Czech Republic

Opsomer, R., Knoth, P., Polen, F., Trapman, J., and Wiering, M. (2008) Categorizing Children: Automated Text Classification of CHILDES files, BNAIC 2008, Enchede, The Netherlands

Knoth, P. (2008) Extraction of Semantic Relations from Texts, In Student EEICT 2008, Brno, Czech Republic


The goal of DiggiCORE is to analyse a vast set of research publications from the Open Access domain using natural language processing and social network analysis methods to identify patterns in the behaviour of research communities, to recognise trends in research disciplines, to learn new insights about the citation behaviours of researchers.

The ServiceCORE project aims to develop a new nation-wide aggregation service that will improve the discovery of research publications stored across British Open Access repositories. The ServiceCORE project will extend the solution provided by the CORE system, developed in the first stage of the Resource Discovery programme.

CORE - The COnnecting REpositories (CORE) project aims to facilitate the access and navigation across scientific papers, stored in British Open Access repositories, using Natural Language Processing and Linked Data.

RETAIN The goal of the RETAIN project is to extend the existing Business Intelligence (BI) functionality that is currently in use at the Open University. The focus will be on using BI to improve student retention. Several initiatives have been instigated with a view to finding ways to improve retention figures, to identify why the problem exists and the different approaches for dealing with these issues.

DECIPHER DECIPHER is a three year is a European Commission supported project which aims to support the discovery and exploration of cultural heritage through story and narrative. To do this we are developing new solutions to the whole range of narrative construction, knowledge visualisation and display problems for museums. The outcome will change the way people access digital heritage by combining rich visualisations, event-based meta-data and causal reasoning models.

The TECH-IT-EASY project develops an information system, based on analytical and knowledge-based tools, able to support electromechanical European SMEs in structuring and systematising the internal product innovation process based on the combined application of QFD (Quality Function Deployment) and technology potentials of TRIZ (Theory of Inventive Problem Solving).

EuroGene is a European Commission supported e-ContentPlus project concerned with providing high quality semantically enriched educational content in genetics. The primary role of KMI within EuroGene is to apply tools and methods for automatic content annotation, cross-language retrieval and the navigation through the available content.

Knowledge in a WiKi - The main objectives of KIWI are to investigate how knowledge management in highly dynamic environments can be supported using Semantic Wiki technologies, and how Semantic Wikis can be improved to satisfy the requirements of knowledge management. For this purpose, KIWI will implement an advanced knowledge management system based on the Semantic Wiki IkeWiki and extend it by improved, rule-based reasoning support, information extraction, personalisation, and advanced visualisations and editors; and verify the system on two use cases in the areas of project knowledge management and software knowledge management, with flexible workflow models and specific support for the respective application areas.


CORE tools - CORE is a system that allows accessing, navigating and downloading content stored across a number of Open Access repositories. CORE provides three tools, CORE Portal, CORE Mobile and CORE Plugin. To find out more, visit the core project website. CORE Portal can be accessed here. CORE Mobile is freely available from the Andorid Market here.

Eurogene portal - Eurogene is an e-learning system in the domain of genetics that provides free multimedia learning resources in nine languages for statistical, medical and molecular genetics and delivers them to students and professionals. The Eurogene content includes presentations, reviewed research articles, images, videos and learning packages submitted by world-leading geneticists.

Jajatr automatic term recognition framework


Information Extraction from Biomedical Texts - master thesis

Annotating Knowledge Resources to Support Learning - probation report