What's in a gene name? Automated refinement of gene name dictionaries

"What's in a name? That which we call a rose by any other name would smell as sweet." By William Shakespeare

Supplementary information

Publication: Jörg Hakenberg, Proceedings of the BioNLP 2007 workshop at ACL 2007, p.153-160, June 29 2007, Prague.
[Abstract]

Enriched dictionary

Variation patterns

All rules are given as a set of possible replacements: "a certain string at a given position in the name can be replaced by another string (potentially at another position)."
Examples:
   -1   L         -1   " ligand"     => 'L' at the last position can be replaced by " ligand" at the same position
   -1   R          1   "receptor of" => 'R' at the last position can be replaced by "receptor of" at the first position
   -1   antigen   -1   ""            => 'antigen' at end of name can be missing

More literature

What's in a Gene Name? Why mapping the language of the human genome may be more of a headache than sequencing it.
Carol Reeves, The Scientist 2005, 19(4):56. [Article]
The molecule role ontology: an ontology for annotation of signal transduction pathway molecules in the scientific literature.
Satoko Yamamoto, Takao Asanuma, Toshihisa Takagi, Ken Ichiro Fukuda, Comparative and Functional Genomics, 5(6-7):528-536.
[Abstract] - [PDF] - [Web page at OBO Foundry]
---A structured controlled vocabulary of concrete protein names and generic (abstract) protein names.

Please send any questions and requests to hakenberg(a)informatik.hu-berlin.de.
[Knowledge Management in Bioinformatics] - [Start page]