Silke Trißl

 

Research Interests

 

Graph Query Optimization


Graph Indexing

Answering the query if a path between two given nodes in a graph exists requires either traversing the graph at query time using depth-first or breadth-first search or querying an existing index on the graph.
For small and very sparse graphs the recursive query strategies are acceptable, but as soon as the graphs become larger these strategies are too slow to answer reachability queries.
We therefore started to investigate into index structures for graphs. The transitive closure is the natural first choice, but due to its computational complexity and space requirement its computation is not possible for graphs larger than 10,000 nodes and 20,000 edges. Other existing approaches either build on the transitive closure or are only applicable for trees or DAGs.
Our aim was to find an index strucutre that is applicable to large, general graphs. We therefore developed GRIPP, an index strucutre that can efficiently index large graphs with directed, unlabeled edges to answer reachatiliy queries.
An open question still remains to find a well suited index structure to efficiently answer distance queries in directed, unlabelled graphs.


Data Integration in the Life Sciences

In the life sciences data about biological objects, such as genes or proteins, are stored in several data sources. Every data source usually covers only one aspect of the objects. For example protein structures are stored in the Protein Data Bank (PDB), protein sequences in UniProt, protein folds in SCOP and CATH, and protein function in the Gene Ontology. For set based analysis, e.g., protein fold prediction, a biologist requires an integrated view on the data.


Integrated database COLUMA       www.columba-db.de

COLUMBA is a database that integrates data from twelve different biological datasources. The integration is centered around entries from the Protein Data Bank (PDB), which are annotated by functional, structural and taxonomic information.
The web interface for COLUMBA is available at http://www.columba-db.de.


Ranking search results

Several sources on biological objects may contain the same information. For example the data sources KEGG, aMAZE, and Reactome contain information about metabolic pathways. These three data sources overlap to a certain degree, but also contain diverse data. Querying all three data sources will result in results supported by one, two, or all three data sources. We call data sources with the similar content dimensions. In a setting with many dimensions the ranking of search results is therefore important. We developed two scores, namely the confidence and surprisingess score to rank search results.
My photo