schema mapping - integration operations

The tendency of database integration had been among the first problems of database schema mapping. Here we want to aggregate data from different independent database into a combined database. Or atleast to let it look like that with defining VIEWs that SELECT records from different databases they can connect to.

Among the foremost problems we have different database implementation strategies - let us call that by the name of access heterogenity. That can be as simple as different network modes to get a connection to the database, slightly different flavours to express record selections from tables, and of course slightly different representations of data combing back for a query. And as an extension there are different transaction protocols for UPDATEs against VIEWs.

In large parts these problems are adressed with WRAPPERs that cover up the access heterogenity of the databases involved. Basically the wrappers translate from the remote protocol to the local protocol. For the actions on the global VIEW it looks like a local database with a single connection and the integration machines handles the tasks to reformulate and forward the queries to the real databases - and then combine the resulting records translating the data to the format expected globally.

The integration approach has a number of theory behind it - in a simple approach one would query all records and filter locally on a condition but it is a lot better to send as much of the filter conditions to a remote database as possible. The wrappers however allow to integrate even over database that are not quite an RDMBS at all - including mere html pages that are parsed for data, or some text files stored in a filesystem. These non-RDBMS data storages are really in use such as the bioinformatics where you will find millions of records in textual format.

The challenges to schema mapping in this area is atleast two folded, one with automatic generation of wrappers when being given a set of database schemas - here we define a global scheme of the intended integrated VIEW and where we know which data fields we need from other databases. Then ask for a wrapper that can get those fields and most optimally.

The other schema mapping is in rule rewriting. The prio approach was a static generation of code for a given subset, in rule rewriting we allow complex conditions on the global view that are rewritten into queries that the wrappers can understand and execute. This mode is much about logic programming of course. After all, we do not look at heterogenity of data or database schemas in here but just the execution models in the database access.

Operations

2004-02-04