Overview, Timetable, and Slides of Presentations

Motivation

Nowadays, electronic data are ubiquitous and exist in different formats, in different locations, and in rapidly increasing volumes. Furthermore, data are often in the form of a stream that is transmitted via a network. Information integration is the problem of combining data from multiple heterogeneous sources into a unifying format accessible by end-users. Information integration is regarded as a major challenge faced by every modern organization concerned with data collection and analysis, data migration, and data evolution. In fact, in a 2008 article in the Communications of the ACM, Phil Bernstein of Microsoft Research and Laura Haas of IBM Research wrote that Large enterprises spend a great deal of time and money on information integration ... Frequently cited as the biggest and most expensive challenge that information-technology shops face, information integration is thought to consume about 40% of their budget. Information integration is also important in scientific research where discovery depends crucially on the integration of scientific data from multiple sources.

The research community has addressed the information integration challenge by investigating in depth certain specific facets of information integration, the most prominent of which are data exchange, data integration, and data streams. Data exchange and data integration deal with the execution of information integration, but they adopt distinctly different approaches. Data exchange is the problem of transforming data residing in different sources into data structured under a target schema; in particular, data exchange entails the materialization of data, after the data have been extracted from the sources and re-structured into the unified format. In contrast, data integration can be described as symbolic or virtual integration: users are provided with the capability to pose queries and obtain answers via the unified format interface, while the data remain in the sources and no materialization of the restructured data takes place. The study of data exchange and data integration has been facilitated by the systematic use of schema mappings, which are high-level specifications (typically expressed in a suitable logical formalism) that describe the relationship between two database schemas. As a matter of fact, schema mappings are often described as the essential building blocks in data exchange and date integration, and have been the object of extensive research investigations in recent years. These investigations span a wide spectrum of topics, from semantics and algorithms to the design and development of systems for data exchange and data integration based on schema mappings.

In the basic data stream model, the input data consists of one or several streams of data items that can be read only sequentially, one after the other. This scenario is relevant for a large number of applications where massive amounts of data need to be processed. Typically, algorithms have to work with one or few passes over the data and a memory buffer of size significantly smaller than the input size. In the past few years, a new theory has emerged for reasoning about algorithms that work within these constraints. This theory involves the design of efficient algorithms, techniques for proving lower bounds on the resources required for solving specific problems, and the design of general-purpose data-stream management systems.

Aim

The main aim of DEIS'10 is to expose young researchers from both academia and industry to state-of-the-art developments in information integration and to prepare them for productive research in data exchange, data integration, and data streams.

Structure of DEIS'10

DEIS'10 will take place at Schloss Dagstuhl on November 7–12, 2010. It will consist of tutorials on each of the main topics, presentations of specialized topics by the participants, and evening problem sessions.

Tutorials:
There will be three 90 minutes long tutorials on data exchange, data integration, and data streams. These will be presented by the three organizers on Monday of the week of the Advanced School.
Participant Presentations:
There will be a total number of twenty-two 45 minute presentations by the participants followed by a 15 minute discussion.
Problem Sessions:
There will be two 90 minutes problem sessions on Tuesday and Thursday evening. The purpose of the problem sessions is twofold:
1. the participants will develop solutions to exercises that will be assigned by the organizers during their tutorial presentations;
2. the organizers and the participants will discuss open research problems on the topics of the Advanced School.
Excursion/Free Time: A hiking or some other outing will be organized on Wednesday afternoon.

Timetable and Slides of Presentations

Monday, Nov 8, 2010:

08:45 - 09:15 : Introductions, logistics, etc.
09:15 - 10:45 : Tutorial 1: Data exchange (Phokion Kolaitis)
10:45 - 11:15 : Coffee Break
11:15 - 12:15 : Talk 1: The chase procedure and its applications to data exchange (Adrian Onet)
12:15 - 01:30 : Lunch Break
01:30 - 03:00 : Tutorial 2: Data integration (Maurizio Lenzerini)
03:00 - 03:15 : Coffee Break
03:15 - 04:15 : Talk 2: Query answering in data integration (Piotr Wieczorek)
04:15 - 04:30 : Coffee Break
04:30 - 06:00 : Tutorial 3: Data streaming (Nicole Schweikardt)
06:00 - 07:30 : Dinner

Tuesday, Nov 9, 2010:

08:45 - 09:45 : Talk 3: Data stream management systems and query languages (Sandra Geisler)
09:45 - 10:45 : Talk 4: Basic algorithmic techniques for processing data streams (Mariano Zelke)
10:45 - 11:15 : Coffee Break
11:15 - 12:15 : Talk 5: Algorithms for computing the core of universal solutions (Vadim Savenkov)
12:15 - 01:45 : Lunch Break
01:45 - 02:45 : Talk 6: The inverse operator on schema mappings and its uses in data exchange (Jorge Perez)
02:45 - 03:45 : Talk 7: Data integration: Consistent query answering (Slawomir Staworko)
03:45 - 04:15 : Coffee Break
04:15 - 05:15 : Talk 8: Description logics for data integration (Yazmin Angelica Ibanez-Garcia)
05:15 - 06:15 : Talk 9: Data cleaning for data integration (Ekaterini Ioannou)
06:15 - 07:30 : Dinner
08:15 - 09:45 : Problem Session 1

Wednesday, Nov 10, 2010:

08:45 - 09:45 : Talk 10: Peer data management systems (Armin Roth)
09:45 - 10:45 : Talk 11: Theory of Peer Data Management (Sebastian Skritek)
10:45 - 11:15 : Coffee Break
11:15 - 12:15 : Talk 12: Quering and mining data streams (Elena Ikonomovska)
12:15 - 01:45 : Lunch Break
01:45 - 02:45 : Talk 13: Semantics of query answering in data exchange (Andre Hernich)
02:45 - 06:00 : Excursion
06:00 - 07:30 : Dinner

Thursday, Nov 11, 2010:

08:45 - 09:45 : Talk 14: XML data integration (Lucja Kot)
09:45 - 10:45 : Talk 15: XML data exchange (Amelie Gheerbrant)
10:45 - 11:15 : Coffee Break
11:15 - 12:15 : Talk 16: Stream-based processing of XML documents (Cristian Riveros)
12:15 - 01:45 : Lunch Break
01:45 - 02:45 : Talk 17: Distributed processing of data streams and large data sets (Marwan Hassani)
02:45 - 03:45 : Talk 18: View-based query processing (Paolo Guagliardo)
03:45 - 04:15 : Coffee Break
04:15 - 05:15 : Talk 19: Integrity constraints in data exchange (Victor Didier Gutierrez Basulto)
05:15 - 06:15 : Talk 20: Analyzing, comparing and debugging schema mappings (Emanuel Sallinger)
06:15 - 07:30 : Dinner
08:15 - 09:45 : Problem Session 2

Friday, Nov 12, 2010:

08:45 - 09:45 : Talk 21: Probabilistic data integration and probabilistic data exchange (Livia Predoiu)
09:45 - 10:45 : Talk 22: Learning and discovering queries and mappings (Marie Jacob)
10:45 - 11:15 : Coffee Break
11:15 - 12:15 : General discussion, next steps, etc.
12:15 - 01:45 : Lunch