Knowledge Management Group

Research Area

Our goal is to develop and study algorithms that discover knowledge in large databases, and large document archives. We analyze learning problems, explore foundations, design principles, and properties of learning algorithms. We investigate applications mainly in the areas of information retrieval and bioinformatics.

News

Research Foci

Active and semi-supervised learning from text.
A challenge of classification learning lies in the effective utilization of unlabeled training data. We investigate algorithms that learn classifiers from few labeled and many unlabeled data. Research papers and websites can be described by their context in the citation graph, in addition to their intrinsic content. Multi-view learning methods utilize both unlabeled training examples and additional context information from the citation network effectively. They are based on an elementary principle: the error risk of a consensual decision of multiple decision makers is lower than the individual risk of every single decision maker.
Information retrieval: spam identification and user assistance.
Machine learning can contribute to several problems of information retrierval. We view spam identification as a game between two opponents (spam sender and spam filter) who react to each other's moves. We are looking for a winning strategy that allows us to identify spam email that will be sent in the future.
We develop user assistance systems that utilize knowledge contained in previously edited text documents in order to support a user, for instance, in writing an email or editing a document.
Text mining in bioinformatics.
In order to generate biological models that, for instance, predict the function of certain genes, it is necessary to consider information that is scattered across a large number of scientific publications. We investigate methods that extract relevant information automatically from research papers and utilize this information for model building.
Mining data streams.
We investigate the principles of algorithms that analyze large databases and discover and explicate hidden knowledge. Learning from very large databases is among the challenges of knowledge discovery. Sampling algorithms process databases which are too large to iterate over all records, and yet provide optimality guarantees. We analyze learning methods and are interested in the methodology of evaluating learners.

Photo Gallery