I'm a computer science postdoc and MD doing research in biomedical informatics at the Department of Computer Science at Humboldt-Universität zu Berlin. I work in the research group for Knowledge Management in Bioinformatics. After studying medicine at Medical University of Vienna and computer science at HU-Berlin, I joined the DFG-funded graduate program SOAMED in 2010 to research service-oriented architectures in a medical area of application, receiving my PhD in 2015. My current research focus is on knowledge mining and similarity search over data relevant to the biomedical domain, including scientific workflows, genomic data, and medical data. I'm PI on the simpatix project, funded by a DFG "Temporary Position for Principal Investigators" (Eigene Stelle), and technical coordinator in PREDICT.

since 05/2015
Postdoctoral researcher at Humboldt-Universität zu Berlin
Funded by a DFG-grant for "Temporary Positions for Principal Investigators" (Eigene Stelle) from October 2016
08/2010 – 05/2015
PhD Student in the DFG-funded graduate school SOAMED
Humboldt-Universität zu Berlin
Department of Computer Science
Knowledge Management in Bioinformatics
09/2012 – 11/2012
Visiting scholar at the Department of Computer and Information Science at the University of Pennsylvania with Prof Susan B. Davidson
Funded by a DAAD short term grant for PhD students
10/2005 – 08/2010
Diploma in Computer Science (Dipl.-Inf.)
Humboldt-Universität zu Berlin
2010 Diploma Thesis "Integration of clinical record data with special consideration of diagnosis classification"
2009 Study Project "Integration and visualization of heterogeneous annotation results of the BioCreAtIvE Meta-Server"
11/2008 – 07/2010 Student assistant at the research group for Knowledge Management in Bioinformatics at Humboldt University
01/2007 – 01/2009 Student assistant at the Institute of Cultural Sciences at Humboldt University
10/1998 – 05/2005
Medical studies (Dr. med. univ.)
Medical University of Vienna
2000 – 2004 Tutor and Research Assistant at the Institute of Anatomy, University of Vienna / Medical University of Vienna with Prof. Helmut Gruber
08/2003 – 09/2003 Research visit at the University of Cincinnati College of Medicine Department of Surgery in the Laboratory of Epithelial Pathobiology with Prof. Jeffrey B. Matthews and Roger T. Worrell

Research interests

  • Similiarity measures and similiarty search over process structured data such as scientific workflows, business processes, or patient histories
  • Text mining and data analytics for medical and especially clinical documents
  • Dataspaces, Pay-as-you-go Data Integration


Johannes Starlinger, Madeleine Kittner, Oliver Blankenstein, and Ulf Leser. (2016).
How to Improve Information Extraction from German Medical Records
Information Technology, Special Issue on Data Integration in the Life Sciences (accepted).

David Wiegandt, Johannes Starlinger, and Ulf Leser. (2016).
Graph n-grams for Scientific Workflow Similarity Search
LWDA, Potsdam, Germany.
Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan B. Davidson, and Ulf Leser. (2015).
Effective and Efficient Similarity Search in Scientific Workflow Repositories
Future Generation Computer Systems 56: 584-594.
Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan B. Davidson, and Ulf Leser. (2014).
Layer Decomposition: An Effective Structure-based Approach for Scientific Workflow Similarity.
10th IEEE International Conference on eScience, Guarujá, SP, Brasil.

Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulakia, and Ulf Leser. (2014).
Similarity Search for Scientific Workflows.
PVLDB, Hangzhou, China.
 VLDB Excellent Presentation Award
Sebastian Wandelt, Johannes Starlinger, Marc Bux, and Ulf Leser. (2013).
RCSI: Scalable similarity search in thousand(s) of genomes.
PVLDB, Hangzhou, China.

Philippe Thomas, Johannes Starlinger, and Ulf Leser. (2013).
Experiences from Developing the Domain-Specific Entity Search Engine GeneView.
BTW, Magdeburg, Germany.
Johannes Starlinger, Sarah Cohen-Boulakia, and Ulf Leser. (2012)
(Re)Use in Public Scientific Workflow Repositories.
Proceedings of the 24th International Conference on Scientific and Statistical Database Management (SSDBM'12), Chania, Crete, Greece, pp. 361-378.

Philippe Thomas, Johannes Starlinger, Alexander Vowinkel, Sebastian Arzt, and Ulf Leser. (2012)
GeneView: A comprehensive semantic search engine for PubMed.
Nucleic Acids Res, 2012 Jul; 40(Web Server issue):W585-91.
Sebastian Arzt, Johannes Starlinger, Oliver Arnold, Stefan Kröger, Samira Jaeger, and Ulf Leser. (2011)
PiPa: Custom Integration of Protein Interactions and Pathways.
GI-Jahrestagung 2011, Workshop "Daten In den Lebenswissenschaften".

Johannes Starlinger, Bernd Schmeck, and Ulf Leser. (2011)
Challenges in Automatic Diagnosis Extraction from Medical Examination Summaries.
CIKM 2011, Workshop on Web Science and Information Exchange in the Medical Web.
Philippe Thomas, Johannes Starlinger, Christoph Jacob, Illes Solt, Jörg Hakenberg and Ulf Leser. (2010)
GeneView Gene-Centric Ranking of Biomedical Text.
Proceedings of BioCreative III, Bethesda, USA, 2010. pp. 137-142.
Johannes Starlinger, Florian Leitner, Alfonso Valencia, and Ulf Leser. (2009)
SOA-based Integration of Text Mining Services.
Proceedings of the 2009 Congress on Services, Los Angeles, USA, pp. 99-106.
 3rd place in the IEEE Services Computing Contest 2009


Philippe Thomas, Johannes Starlinger, Christoph Jacob, Jörg Hakenberg, and Ulf Leser. (2010)
GeneView Gene-Centric Ranking of Biomedical Text.
Poster - BioCreative III workshop, Washington, USA, September 2010.


Johannes Starlinger
simpatix: Similarity Search for Richly Annotated Structured Patient Cases.
Future Medicine Science Match 2016, Berlin.
Johannes Starlinger, Bernd Schmeck, Bettina Temmesfeld-Wollbrück, Norbert Suttorp, and Ulf Leser. (2011)
Datenbankintegration, automatische Diagnoseklassifikation und statistische Analyse von BAL-Befunden.
52. Kongress der Deutschen Gesellschaft für Pneumologie und Beatmungsmedizin e.V.


Summer term 2015
Seminar Informatik in der Medizin

Exercises following the course Grundlagen der Bioinformatik
Together with Ulf Leser
Summer term 2013
Seminar Similarity Search
Together with Ulf Leser, Sebastian Wandelt, Andre Koschmieder, and Astrid Rheinländer
Summer term 2012
Seminar Large Scale Data Analysis
Together with Ulf Leser, Mark Bux, and Astrid Rheinländer
Lecture "Creating a basic web2.0-application using XHTML, CSS, JavaScript and PHP"
Part of the Ringvorlesung zu Semesterprojekten
Material: Slides and Code



In the simpatix project, we investigate similarity search over electronic health records. These records consist of mostly unstructured or semi-structured data, such as clinical notes from examinations and treatments, tabularized data from quantitative tests (such as blood screenings), or discharge summaries, but encode an implicit process describing the individual patient’s disease history. In simpatix, we extract this process from EHRs, together with rich annotations of clinically relevant entities (e.g., diagnoses, treatments, or procedures), and investigate similarity measures for such process-structured case representations to compare and find similar cases. In the end, we want to deploy these measures in similarity search over large collections of patient cases to enable use cases such as clinical decision support.
Our aim in PREDICT is to develop a software system that enables clinicians to use the large body of existing data on the relationships between genetic/epigenetic alterations on the one hand, and treatment options in cancer and their success on the other hand. We are developing a semantically integrated cancer-type specific knowledge base drawing its data from a variety of data sources, including scientific publications, genotype/phenotype databases, drug effectiveness screens, clinical trials, large-scale association studies etc., using advanced and innovative algorithms for knowledge extraction, semantic data integration, and biomedical text mining.


In the FlowAlike project we collected a corpus of similarity ratings for scientific workflows from a dataset of 1483 Taverna workflows and a second dataset of 139 Galaxy workflows, to be used for the evaluation of algorithmic similarity measures. Around 2400 ratings were manually assigned by scientific workflow experts, and are available for download to test and compare your own algorithms!
GeneView is a web-based tool for searching and visualizing a deeply annotated copy of PubMed and the open access subset of PubMed Central. While I was majorly resonsible for creating the web-interface during my time as a student assistant in the research group for Knowledge Management in Bioinformatics at HU-Berlin, Philippe Thomas is the tool's main developer and researcher behind it. Check it out!
The BC-VisCon tool was created as part of my study project "Integration and visualization of heterogeneous annotation results of the BioCreAtIvE Meta-Server".
Goal of the BC-VisCon website is to provide an intuitive and flexible tool for viewing and analyzing aggregations and comparisons of the annotations made by the gene mention tagging servers connected to the BioCreAtIve-MetaServer.