LLL'05 Challenge: Genic Interaction Extraction with Alignments and Finite State Automata

Jörg Hakenberg1*, Conrad Plake1, Ulf Leser1, Harald Kirsch2, and Dietrich Rebholz-Schuhmann2

1 Humboldt-Universität zu Berlin, Department of Computer Science, Knowledge Management Group, Unter den Linden 6, 10099 Berlin, Germany.
2 European Bioinformatics Institute, Rebholz-Group, Hinxton CB10 1SD, United Kingdom.
* Corresponding author. Current affiliation: Knowledge Management in Bioinformatics, Dept. Computer Science, Humboldt-Universität zu Berlin, Rudower Chaussee 25, 12489 Berlin, Germany. Phone: +49.30.2093.3903, eMail: hakenberg(a)informatik.hu-berlin.de


Abstract

We present a system for the identification of syntax patterns describing interactions between genes and proteins in scientific text. The system uses sequence alignments applied to sentences annotated with interactions and syntactical information (part-of-speech), as well as finite state automata optimized with a genetic algorithm. Both methods identified syntactical patterns that are generalizations of textual representations of agent-target relations. We match the generated patterns against arbitrary text to extract interactions and their respective partners. Our best system uses finite state automata optimized with a genetic algorithm, and scored an F1-measure of 51.8% on the LLL'05 evaluation set.

Supplementary Information


Published in
Proceedings of the Learning Language in Logic Workshop (LLL05) at the 22nd ICML 2005, pp. 38-45. Bonn, Germany, August 2005.
[LLL'05] - [LLL'05 Challenge] - [ICML 2005]

@InProceedings{Hakenberg:2005c,
  author = {J\"org Hakenberg and Conrad Plake and Ulf Leser and Harald Kirsch and Dietrich Rebholz-Schuhmann},
  title = {LLL'05 Challenge: Genic Interaction Extraction with Alignments and Finite State Automata},
  booktitle = {Proc Learning Language in Logic Workshop (LLL05) at the 22nd Int Conf on Machine Learning},
  address = {Bonn, Germany},
  month = {August},
  year = 2005,
  pages = {38-45}
}