Second European Workshop on Data
Mining and Text Mining for Bioinformatics
24 September 2004, Pisa, Italy
in conjunction with ECML/PKDD 2004: The
Fifteenth
European Conference on Machine Learning (ECML) and The Eighth European
Conference on Principles and Practice of Knowledge Discovery in
Databases (PKDD), 20-24 September, 2004, Pisa, Italy.
Abstract
In the past years, research in molecular biology and
molecular medicine has accumulated enormous amounts of data. This
includes genomic sequences gathered by the Human Genome Project, gene
expression data from microarray experiments, protein identification and
quantification data from proteomics experiments, and SNP data from
high-throughput SNP arrays. However, our understanding of the
biological processes underlying these data lags far behind. There is a
strong interest in employing methods of knowledge discovery and data
mining to generate models of biological systems. Mining biological
databases imposes challenges which knowledge discovery and data mining
have to address and which will be in the focus of the workshop.
- Important background
knowledge in bioinformatics is
often buried in textual documents, such as scientific publi-cations or
database
annotations. Text mining and information extraction approaches
currently
being studied range from term recognition to extraction of complex
relationships
of interaction between proteins.
- Analysing data from
biological databases often
requires the consideration of data from multiple relations rather than
from one single table. Recently, approaches (such as
propositionalization algorithms) are being studied that utilize
multi-relational data and yet meet the efficiency requirements of
large-scale data mining problems.
- It is difficult and
requires profound understanding
of both knowledge discovery and computational biology to identify
problems
and optimization criteria which, when maximized by knowledge discovery
algorithms, actually contribute to a better understanding of biological
systems. Identification of appropriate knowledge discovery problems and
development of evaluation methods for knowledge discovery results are
ongoing efforts.
Goals and Intended Audience
In order to build
knowledge discovery systems that contribute to our
understanding of biological systems, solutions to the above problems
have to be assembled into efficient and scalable systems. The workshop
aims at facilitating this process, and at enhancing the exchange of
knowledge between computational biologists and knowledge discovery
researchers. Accordingly, our intended audience are both computational
biologists and KDD researchers.
Program
The program consists of
invited talks by Yves Moreau and Alfonso
Valencia and contributed presentations. Contributed paper will be
presented by a 10 minutes oral spotlight presentation in the morning
session and a poster in the afternoon sessions.
| 10:30-11:00 |
Invited
Talk
Yves Moreau: Integrating
Text and Microarray Data: Gene Expression and Comparative Genomic
Hybridization.
|
11:10-12:50
|
Paper Spotlight Presentations
Hong Chai
and Carlotta Domeniconi: An Evaluation of Gene Selection
Methods for Multi-Class Microarray Data Classification.
Nikolai Daraselia,
Sergei Egorov, Andrey Yazhuk, Svetlana Novichkova, Anton Yuryev,
and Ilya Mazo: Extracting Protein Function Information from MEDLINE
Using a Full-Sentence Parser.
Yoshikazu Kaneta,
Md. Ahaduzzaman Munna, and Takenao Ohkawa: A Method of
Extracting Sentences Related to Protein Interaction from Literature
using a Structure Database.
Svetlana
Kiritchenko, Stan Matwin, and A. Fazel Famili: Hierarchical Text
Categorization as a Tool of Associating Genes with Gene Ontology Codes.
Judice L.Y.Koh,
Mong Li Lee, Asif M. Khan, Paul T.J. Tan, and Vladimir Brusic:
Duplicate Detection in Biological Data using Association Rule Mining.
Andrea Malossini,
Enrico Blanzieri, Raymond T. Ng: Assessment of SVM Reliability
for Microarrays Data Analysis.
Brigitte Mathiak
and Silke Eckstein: Five Steps to Text Mining in Biomedical Literature.
Oleg Okun: Protein
Fold Recognition with K-Local Hyperplane Distance Nearest Neighbor
Algorithm.
F. Psomopoulos, S.
Diplaris, and P. A. Mitkas: A Finite State Automata Based
Technique for Protein Classification Rules Induction.
Fabio Rinaldi,
Gerold Schneider, Kaarel Kaljurand, James Dowdall, Christos Andronis,
Andreas Persidis, and Ourania Konstanti: Mining relations in the
GENIA corpus.
|
13:00-13:30
|
Invited Talk
Alfonso
Valencia: Information Extraction in Molecular
Biology.
|
15:00-16:00
|
Poster Presentation
All technical contributions
|
16:00-16:30
|
Coffee Break
|
16:30-
|
Poster Presentation
All technical contribution |
Proceedings
Get the workshop proceedings as
one PDF file.
- Cover,
table of contents,
and invited talks by Yves Moreau and Alfonso Valencia.
- Hong Chai and
Carlotta Domeniconi: An Evaluation of Gene
Selection Methods for Multi-Class Microarray Data Classification.
- Nikolai Daraselia,
Sergei Egorov, Andrey Yazhuk, Svetlana Novichkova,
Anton Yuryev, and
Ilya Mazo: Extracting Protein Function
Information from MEDLINE Using a Full-Sentence Parser.
- Yoshikazu Kaneta,
Md. Ahaduzzaman Munna, and Takenao Ohkawa: A
Method of Extracting Sentences Related to Protein Interaction from
Literature using a Structure Database.
- Svetlana
Kiritchenko, Stan Matwin, and A. Fazel Famili: Hierarchical Text Categorization as a Tool of
Associating Genes with Gene Ontology Codes.
- Judice L.Y.Koh,
Mong Li Lee, Asif M. Khan, Paul T.J. Tan, and Vladimir Brusic: Duplicate Detection in Biological Data using
Association Rule Mining.
- Andrea Malossini,
Enrico Blanzieri, Raymond T. Ng: Assessment
of SVM Reliability for Microarrays Data Analysis.
- Brigitte Mathiak
and Silke Eckstein: Five Steps to Text
Mining in Biomedical Literature.
- Oleg Okun: Protein
Fold Recognition with K-Local Hyperplane
Distance Nearest Neighbor Algorithm.
- F. Psomopoulos, S.
Diplaris, and P. A. Mitkas: A Finite State
Automata Based Technique for Protein Classification Rules Induction.
-
Workshop
Chair
Tobias
Scheffer, Humboldt-Universität zu Berlin.
Program Committee
- Sourav
Bhomwick , Nanyang Technological University, Singapore.
- Christian
Blaschke , Centro Nacional de Biotecnologia.
- Vladimir
Brusic , Institute for Infocomm
Research, Singapore.
- Carol
Friedman, Columbia University.
- George Forman
, Hewlett Packard.
- Rob
Gaizauskas, University of Sheffield.
- Jörg Hakenberg,
Humboldt-Universität zu Berlin.
- Ross King ,
University of Wales, Aberystwyth, and PharmaDM.
- Adam
Kowalczyk, Telstra & Peter MacCallum Cancer Centre.
- Stefan Kramer
, Technische Universität München.
- Ulf
Leser, Humboldt-Universität zu Berlin.
- Bhavani
Raskutti, Telstra.
- Steffen
Schulze-Kremer , Max-Planck-Institute, Berlin.
- Myra
Spiliopoulou , University of Magdeburg.
- Alfonso
Valencia , Centro Nacional de Biotecnologia, Spain.
- David Vogel,
AI Insight.
- Mohammed Zaki,
Rensselaer Polytechnic Institute.
Support
The workshop received support from the European Network of Excellence
in Knowledge Discovery, KD-Net.
Submissions
and Dates (passed)
The submission deadline has passed.
Please note that the submission deadline has been
extended to June 21.
Submission deadline: June 21, 2004
Notification of acceptance: July 12, 2004
Camera-ready copies due: July 19, 2004
Workshop: September 24, 2004
Previous Event
First
European Workshop on Data Mining and Text Mining for
Bioinformatics held at ECML / PKDD
2003 in Cavtat, Croatia.
|