BRONCO150

BRONCO150 is a corpus containing selected sentences of 150 German discharge summaries of cancer patients (hepatocelluar carcinoma or melanoma) treated at Charite Universitaetsmedizin Berlin or Universitaetsklinikum Tuebingen. All discharge summaries were manually anonymized. The original documents were scrambled at the sentence level to make reconstruction of individual reports impossible.

The corpus is annotated with the following labels and normalized to terminologies given in brackets:

BRONCO150 is provided in five splits (randomSentSet1-5) in XML and CONLL format. Results from baseline state-of-the-art NER methods on all entities using a cross-validation setting can be found in the BRONCO paper (see below).

Access

BRONCO is provided on request only. If you are interested to use BRONCO150 for academic research focusing on German clincial NLP please fill in the provided data usage agreement and sent it to Prof. Ulf Leser.

Citing BRONCO

Madeleine Kittner, Mario Lamping, Damian T Rieke, Julian Götze, Bariya Bajwa, Ivan Jelas, Gina Rüter, Hanjo Hautow, Mario Sänger, Maryam Habibi, Marit Zettwitz, Till de Bortoli, Leonie Ostermann, Jurica Ševa, Johannes Starlinger, Oliver Kohlbacher, Nisar P Malek, Ulrich Keilholz, Ulf Leser (2021).
Annotation and initial evaluation of a large annotated German oncological corpus
JAMIA Open, Volume 4, Issue 2, ooab025.

BRONCO50

BRONCO150 is accompanied by BRONCO50, a held-back dataset annotated along the same lines and used only for evaluating models. BRONCO50 is kept secred by the creators of BRONCO. If you want to evaluate your models for NER on German medical texts on BRONCO50, please get in touch with Prof. Ulf Leser. You will have to submit your tagger to us; we run it on BRONCO50 and return the results.

BRONCO50 - Leaderboard

We are currently aware of the following results on BRONCO50; all authors agreed with publication of this information on this site.

Diagnoses

Contact author Mail Precision Recall F1 Method Publication
1 Johanna Bohn johanna.e.bohn@gmail.com 82.08 79.46 80.75 We fine-tuned a set of transformer-based language models on the BRONCO150 dataset. Hyperparameter optimisation was conducted using a Bayesian optimisation algorithm called TPE (Tree-Structured Parzen Estimator). The best-performing models are the monolingual GELECTRA large and multilingual XLM-RoBERTa large.
2 Aleksander Salek salekale@hu-berlin.de 81.18 79.79 80.48 xmlroberta_861
3 Henning Schäfer Henning.Schaefer@uk-essen.de 79.24 77.17 78.19 This approach is based on a German pre-trained Transformer language model (deepsetai/gbert), which was subsequently further trained on the BRONCO 150 annotated entities. The model was pre-trained on the German datasets OSCAR, OPUS, Wikipedia and OpenLegalData. doi
4 BRONCO Paper 79.75 68.33 73.6 crf doi

Medication

Contact author Mail Precision Recall F1 Method Publication
1 Johanna Bohn johanna.e.bohn@gmail.com 94.17 95.22 94.69 We fine-tuned a set of transformer-based language models on the BRONCO150 dataset. Hyperparameter optimisation was conducted using a Bayesian optimisation algorithm called TPE (Tree-Structured Parzen Estimator). The best-performing models are the monolingual GELECTRA large and multilingual XLM-RoBERTa large.
2 Aleksander Salek salekale@hu-berlin.de 94.41 94.94 94.68 xmlroberta_861
3 Henning Schäfer Henning.Schaefer@uk-essen.de 92.92 95.79 94.33 This approach is based on a German pre-trained Transformer language model (deepsetai/gbert), which was subsequently further trained on the BRONCO 150 annotated entities. The model was pre-trained on the German datasets OSCAR, OPUS, Wikipedia and OpenLegalData. doi
4 BRONCO Paper 94.85 87.92 91.25 crf doi

Treatment

Contact author Mail Precision Recall F1 Method Publication
1 Johanna Bohn johanna.e.bohn@gmail.com 79.51 83.98 81.68 We fine-tuned a set of transformer-based language models on the BRONCO150 dataset. Hyperparameter optimisation was conducted using a Bayesian optimisation algorithm called TPE (Tree-Structured Parzen Estimator). The best-performing models are the monolingual GELECTRA large and multilingual XLM-RoBERTa large.
2 Aleksander Salek salekale@hu-berlin.de 77.96 82.68 80.25 xmlroberta_861
3 Henning Schäfer Henning.Schaefer@uk-essen.de 78.22 82.4 80.25 This approach is based on a German pre-trained Transformer language model (deepsetai/gbert), which was subsequently further trained on the BRONCO 150 annotated entities. The model was pre-trained on the German datasets OSCAR, OPUS, Wikipedia and OpenLegalData. doi
4 BRONCO Paper 84.11 73.3 78.33 crf doi

Acknowledgements

BRONCO was created by the collaborative research project "Personalizing Oncology via Semantic Integration of Data" (PersOnS), led by Prof. Dr. Oliver Kohlbacher, Universitaet Tuebingen. In particular, the following persons were involved: We furthermore acknowledge funding from the German Ministry for Education and Research, grant 031L0030B (programme i:DSEM).