I'm Dr.-Ing. Dr.med.univ. Johannes Starlinger, both a medical doctor and a computer scientist, and a TÜV Süd certified specialist for medical software regulatory. For over a decade, I've been working, researching, and teaching in the areas of biomedical and clinical data, data and information systems, big data analytics and data science. Today, I work as a digital health consultant and software developer, and as a lecturer at the intersection of health data, machine learning and information systems, medical device regulatory, and digital health innovation.

After studying medicine at Medical University of Vienna and computer science at HU-Berlin, I joined the DFG-funded graduate program SOAMED in 2010 to research service-oriented architectures in a medical area of application, receiving my PhD in 2015. As a PostDoc in the research group for Knowledge Management in Bioinformatics at the Department of Computer Science at Humboldt-Universität zu Berlin and at the Department of Anesthesiology and Operative Intensive Care Medicine at Charité – Universitätsmedizin Berlin, my research focus was on knowledge mining and representation, similarity search, and predictive analytics over data relevant to the biomedical domain, including genomic and clinical data. I was PI on the simpatix project, funded by a DFG "Temporary Position for Principal Investigators" (Eigene Stelle), technical coordinator in PREDICT, and member of the ASTRODEM project.

Academic CV

Here's an overview of my (past) academic CV. Today, I provide consulting, research and development services in area of health and healthcare data at Howto Health GmbH

01/2018 – 12/2021
Visiting researcher at Humboldt-Universität zu Berlin
01/2018 – 09/2019
Health data researcher at Charité – Universitätsmedizin Berlin
Funded by a DFG-grant for "Temporary Positions for Principal Investigators" (Eigene Stelle)
05/2015 – 12/2017
Postdoctoral researcher at Humboldt-Universität zu Berlin
Funded by a DFG-grant for "Temporary Positions for Principal Investigators" (Eigene Stelle) from October 2016
08/2010 – 05/2015
PhD Student in the DFG-funded graduate school SOAMED (Dr.-Ing.)
Humboldt-Universität zu Berlin
Department of Computer Science
Knowledge Management in Bioinformatics
09/2012 – 11/2012
Visiting scholar at the Department of Computer and Information Science at the University of Pennsylvania with Prof Susan B. Davidson
Funded by a DAAD short term grant for PhD students
10/2005 – 08/2010
Diploma in Computer Science (Dipl.-Inf.)
Humboldt-Universität zu Berlin
2010 Diploma Thesis "Integration of clinical record data with special consideration of diagnosis classification"
2009 Study Project "Integration and visualization of heterogeneous annotation results of the BioCreAtIvE Meta-Server"
11/2008 – 07/2010 Student assistant at the research group for Knowledge Management in Bioinformatics at Humboldt University
01/2007 – 01/2009 Student assistant at the Institute of Cultural Sciences at Humboldt University
10/1998 – 05/2005
Medical studies (Dr. med. univ.)
Medical University of Vienna
2000 – 2004 Tutor and Research Assistant at the Institute of Anatomy, University of Vienna / Medical University of Vienna with Prof. Helmut Gruber
08/2003 – 09/2003 Research visit at the University of Cincinnati College of Medicine Department of Surgery in the Laboratory of Epithelial Pathobiology with Prof. Jeffrey B. Matthews and Roger T. Worrell


Gilbert S, Fenech M, Hirsch M, Upadhyay S, Biasiucci A, Starlinger J
Algorithm Change Protocols in the Regulation of Adaptive Machine Learning–Based Medical Devices
J Med Internet Res 2021;23(10):e30545, https://doi.org/10.2196/30545
Starlinger, P., Ubl, D. S., Hackl H., Starlinger, J., Nagorney, D. M., Smoot, R. L., Habermann, E. B., Cleary, S. P.
Combined APRI/ALBI score to predict mortality after hepatic resection
BJS Open, Volume 5, Issue 1, January 2021, zraa043, https://doi.org/10.1093/bjsopen/zraa043
Kittner, M., Lamping, M., Rieke, D., Götze, J., Bajwa, B., Jelas, I., Rüter, G., Hautow, H., Sänger, M., Habibi, M., Zettwitz, M., de Bortoli, T., Ostermann, L., Ševa, J., Starlinger, J., Kohlbacher, O., Malek, N., Keilholz, U., Leser, U.
Annotation and Initial Evaluation of a Large Annotated German Oncological Corpus
JAMIA Open Volume 4, Issue 2, ooab025.
Habibi, M., Starlinger, J. and Leser, U. (2020).
TabSim: A Siamese Neural Network for Accurate Estimation of Table Similarity
IEEE Big Data (accepted).
Habibi, M., Starlinger, J. and Leser, U. (2020).
A Permutation Invariant Neural Network for Table Orientation Classification.
Data Mining and Knowledge Discovery 34.6 (2020): 1963-1983.
Ford, E., Starlinger, J., Rooney, P., Oliver, S., Banerjee, S., van Marwijk, H., & Cassell, J. (2020).
Could dementia be detected from UK primary care patients’ records by simple automated methods earlier than by the treating physician? A retrospective case-control study.
Wellcome Open Research, 5(120), 120.
Seva, J., Wiegandt, D., Goetze, J., Lamping, M., Rieke, D. T., Schaefer, R., Jähnichen, P., Kittner, M., Pallarz, S., Starlinger, J., Keilholz, Ul., and Leser, U. (2019).
VIST – A Variant-Information Search Tool for Precision Oncology
BMC Bioinformatics 20(1).
Pallarz, S., Benary, M., Lamping, M., Rieke, D., Starlinger, J., Sers, C., Wiegandt, D. L., Seibert, M., Seva, J., Schäfer, R., Keilholz, U., and Leser, U. (2019).
Comparative Analysis of Public Knowledge Bases for Precision Oncology
JCO Precision Oncology 3 (2019): 1-8.
Johannes Starlinger, Steffen Pallarz, Jurica Ševa, Damian Rieke, Christine Sers, Ulrich Keilholz, and Ulf Leser (2018).
Variant information systems for precision oncology.
BMC Medical Informatics and Decision Making 2018 18:107
Madeleine Kittner, Bariya Bajwa, Damian Rieke, Mario Lamping, Johannes Starlinger, and Ulf Leser (2017).
Design of an Information Extraction Pipeline for German Clinical Texts.
AIME Workshop on Extraction and Processing of Rich Semantics from Medical Texts (accepted).
Johannes Starlinger, Madeleine Kittner, Oliver Blankenstein, and Ulf Leser. (2016).
How to Improve Information Extraction from German Medical Records
it – Information Technology, Special Issue on Data Integration in the Life Sciences.

David Wiegandt, Johannes Starlinger, and Ulf Leser. (2016).
Graph n-grams for Scientific Workflow Similarity Search
LWDA, Potsdam, Germany.

Johannes Starlinger (2016).
Similarity Measures for Scientific Workflows
Dissertation, Humboldt-Universität zu Berlin.
Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan B. Davidson, and Ulf Leser. (2015).
Effective and Efficient Similarity Search in Scientific Workflow Repositories
Future Generation Computer Systems 56: 584-594.
Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan B. Davidson, and Ulf Leser. (2014).
Layer Decomposition: An Effective Structure-based Approach for Scientific Workflow Similarity.
10th IEEE International Conference on eScience, Guarujá, SP, Brasil.

Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulakia, and Ulf Leser. (2014).
Similarity Search for Scientific Workflows.
PVLDB, Hangzhou, China.
 VLDB Excellent Presentation Award
Sebastian Wandelt, Johannes Starlinger, Marc Bux, and Ulf Leser. (2013).
RCSI: Scalable similarity search in thousand(s) of genomes.
PVLDB, Hangzhou, China.

Philippe Thomas, Johannes Starlinger, and Ulf Leser. (2013).
Experiences from Developing the Domain-Specific Entity Search Engine GeneView.
BTW, Magdeburg, Germany.
Johannes Starlinger, Sarah Cohen-Boulakia, and Ulf Leser. (2012)
(Re)Use in Public Scientific Workflow Repositories.
Proceedings of the 24th International Conference on Scientific and Statistical Database Management (SSDBM'12), Chania, Crete, Greece, pp. 361-378.

Philippe Thomas, Johannes Starlinger, Alexander Vowinkel, Sebastian Arzt, and Ulf Leser. (2012)
GeneView: A comprehensive semantic search engine for PubMed.
Nucleic Acids Res, 2012 Jul; 40(Web Server issue):W585-91.
Sebastian Arzt, Johannes Starlinger, Oliver Arnold, Stefan Kröger, Samira Jaeger, and Ulf Leser. (2011)
PiPa: Custom Integration of Protein Interactions and Pathways.
GI-Jahrestagung 2011, Workshop "Daten In den Lebenswissenschaften".

Johannes Starlinger, Bernd Schmeck, and Ulf Leser. (2011)
Challenges in Automatic Diagnosis Extraction from Medical Examination Summaries.
CIKM 2011, Workshop on Web Science and Information Exchange in the Medical Web.
Philippe Thomas, Johannes Starlinger, Christoph Jacob, Illes Solt, Jörg Hakenberg and Ulf Leser. (2010)
GeneView Gene-Centric Ranking of Biomedical Text.
Proceedings of BioCreative III, Bethesda, USA, 2010. pp. 137-142.
Johannes Starlinger, Florian Leitner, Alfonso Valencia, and Ulf Leser. (2009)
SOA-based Integration of Text Mining Services.
Proceedings of the 2009 Congress on Services, Los Angeles, USA, pp. 99-106.
 3rd place in the IEEE Services Computing Contest 2009


Akira-Sebastian Poncette, Johannes Starlinger, Claudia Spies, Gerald Vorderwülbecke, Sascha Treskatsch, Felix Balzer (2018)
qSOFA-Score als prognostischer Marker für Mortalität auf der Intensivstation.
Poster - DIVI 2018, Leipzig, Germany, December 2018.
Akira-Sebastian Poncette, Johannes Starlinger, Claudia Spies, Gerald Vorderwülbecke, Sascha Treskatsch, Felix Balzer (2018)
Prognostic Performance Of The QSOFA Score For In-Hospital Mortality In ICU.
Poster - PDA72 2018, New York, USA, December 2018.
Philippe Thomas, Johannes Starlinger, Christoph Jacob, Jörg Hakenberg, and Ulf Leser. (2010)
GeneView Gene-Centric Ranking of Biomedical Text.
Poster - BioCreative III workshop, Washington, USA, September 2010.

selected talks

Johannes Starlinger
simpatix: Similarity Search for Richly Annotated Structured Patient Cases.
Future Medicine Science Match 2016, Berlin.
Johannes Starlinger, Bernd Schmeck, Bettina Temmesfeld-Wollbrück, Norbert Suttorp, and Ulf Leser. (2011)
Datenbankintegration, automatische Diagnoseklassifikation und statistische Analyse von BAL-Befunden.
52. Kongress der Deutschen Gesellschaft für Pneumologie und Beatmungsmedizin e.V.

Scientific Projects


Our aim in PREDICT is to develop a software system that enables clinicians to use the large body of existing data on the relationships between genetic/epigenetic alterations on the one hand, and treatment options in cancer and their success on the other hand. We are developing a semantically integrated cancer-type specific knowledge base drawing its data from a variety of data sources, including scientific publications, genotype/phenotype databases, drug effectiveness screens, clinical trials, large-scale association studies etc., using advanced and innovative algorithms for knowledge extraction, semantic data integration, and biomedical text mining.
In the simpatix project, we investigate similarity search over electronic health records. These records consist of mostly unstructured or semi-structured data, such as clinical notes from examinations and treatments, tabularized data from quantitative tests (such as blood screenings), or discharge summaries, but encode an implicit process describing the individual patient’s disease history. In simpatix, we extract this process from EHRs, together with rich annotations of clinically relevant entities (e.g., diagnoses, treatments, or procedures), and investigate similarity measures for such process-structured case representations to compare and find similar cases. In the end, we want to deploy these measures in similarity search over large collections of patient cases to enable use cases such as clinical decision support.
Indentifying predictive markers for the onset of dementia and creating predictive models which will help general practitioners (GPs) identify patients at high risk of dementia is the aim of the ASTRODEM project at Brighton and Sussex Medical School, using 96,000 anonymised GP patient records from the Clinical Practice Research Datalink (CPRD). Collaborating with Dr Elizabeth Ford and her team as a visiting researcher, my special interest is making use of longitudinal information include in the patient records.
In the FlowAlike project we collected a corpus of similarity ratings for scientific workflows from a dataset of 1483 Taverna workflows and a second dataset of 139 Galaxy workflows, to be used for the evaluation of algorithmic similarity measures. Around 2400 ratings were manually assigned by scientific workflow experts, and are available for download to test and compare your own algorithms!
GeneView is a web-based tool for searching and visualizing a deeply annotated copy of PubMed and the open access subset of PubMed Central. While I was majorly resonsible for creating the web-interface during my time as a student assistant in the research group for Knowledge Management in Bioinformatics at HU-Berlin, Philippe Thomas is the tool's main developer and researcher behind it. Check it out!
PiPa is a Java-based tool to setup and maintain a comprehensive database of information on protein-protein interactions and biological pathways, integrated from a range of multiple public protein information databases. The data is stored in a local MySQL database to provide programmatic access through standard SQL query APIs. The tool is still available for download.

The BC-VisCon tool was created as part of my study project "Integration and visualization of heterogeneous annotation results of the BioCreAtIvE Meta-Server".
Goal of the BC-VisCon website is to provide an intuitive and flexible tool for viewing and analyzing aggregations and comparisons of the annotations made by the gene mention tagging servers connected to the BioCreAtIve-MetaServer.


Winter term 2022/23
Lecture Introduction to Digital Health (course in German)
Winter term 2020/21
Lecture Introduction to Digital Health (course in German)
Summer term 2020
Lecture unit on Patient Data Modalities in Electronic Health Records as part of course on Applied Medical Informatics in the Master of Epidemiology program at the Berlin School of Public Health
Summer term 2019
Lecture unit on Information Extraction from Electronic Health Records as part of The Intensive Short Course on Medical Informatics at the Berlin School of Public Health
Summer term 2017
Lecture Introduction to Bioinformatics
Winter term 2016/17
Guest lecture on Process Similarity Search as part of Ulf Leser's lecture Implemantation of Database Systems
Summer term 2016
Lecture unit on Protein Function and Structure Prediction as part of Ulf Leser's lecture Introduction to Bioinformatics
Summer term 2015
Seminar Informatik in der Medizin

Exercises following the course Grundlagen der Bioinformatik
Together with Ulf Leser
Summer term 2013
Seminar Similarity Search
Together with Ulf Leser, Sebastian Wandelt, Andre Koschmieder, and Astrid Rheinländer
Summer term 2012
Seminar Large Scale Data Analysis
Together with Ulf Leser, Mark Bux, and Astrid Rheinländer
Lecture unit on "Creating a basic web2.0-application using XHTML, CSS, JavaScript and PHP"
Part of the Ringvorlesung zu Semesterprojekten
Material: Slides and Code

Professional Activities

Scientific Service

  • External Reviewer for BMC Bioinformatics, ICDE, EDBT, SMBM, LWDA
  • Reviewer for VLDBJ, TKDE, Journal of Biomedical Semantics, Science China, DESRIST


Co-organizer of the workshop on Extracting evidence from clinical free text: opportunities and challenges at the Informatics for Health Conference, Manchester, UK