What's in a gene name? Automated refinement of gene name dictionaries
"What's in a name? That which we call a rose by any other name would
smell as sweet." By William Shakespeare
Supplementary information
Publication: Jörg Hakenberg, Proceedings of the BioNLP 2007 workshop at ACL 2007, p.153-160, June 29 2007, Prague.
[Abstract]
Enriched dictionary
- Original dictionary: "masterlist" as provided by BioCreative II (GN task);
derived from Entrez Gene (32,980 human genes)
- Enriched dictionary
Variation patterns
All rules are given as a set of possible replacements: "a certain string at a given position in the name can be replaced by another string (potentially at another position)."
Examples:
-1 L -1 " ligand" => 'L' at the last position can be replaced by " ligand" at the same position
-1 R 1 "receptor of" => 'R' at the last position can be replaced by "receptor of" at the first position
-1 antigen -1 "" => 'antigen' at end of name can be missing
- 1: first position, 2: second position, ..
- 0: any position
- -1: last position, -2: pre-last, ..
- Set of rules -- regrettably, I have not yet found a convenient way to export&express all transformation rules; however, all rules were somehow inferred from examples, so I should come up with a proper grammar at some point
More literature
- What's in a Gene Name? Why mapping the language of the human genome may be more of a headache than sequencing it.
- Carol Reeves, The Scientist 2005, 19(4):56. [Article]
- The molecule role ontology: an ontology for annotation of signal transduction pathway molecules in the scientific literature.
- Satoko Yamamoto, Takao Asanuma, Toshihisa Takagi, Ken Ichiro Fukuda, Comparative and Functional Genomics, 5(6-7):528-536.
[Abstract] -
[PDF] -
[Web page at OBO Foundry]
---A structured controlled vocabulary of concrete protein names and generic (abstract) protein names.
Please send any questions and requests to hakenberg(a)informatik.hu-berlin.de.
[Knowledge Management in Bioinformatics] - [Start page]