cv Aktuell Seminare Reports Homepage Software
printer / text mode version
university-logo
draheim
@informatik.hu-berlin.de

Reports
- postindustr.CC
- XML/Ti Report
- pTA StudienArbeit
- sch_llf study
- Geschichte des PC
 
schema-mappingen
  ig cv hg re dv ev
  zz mk pr
java problemsen
  lang swing ext gtk jjtree xul
 
boot
-grub-netboot
-grub-gtk
-partclone freshmeat
-partimage links
 
releaseuploader
 
 
 

sitemap             *offsite link

2004-04-21
(C) Guido Draheim
guidod@gmx.de

 
generated by mksite.sh
2005-06-20

schema-mapping - conversion operations

Integration might be easy if the local databases have been designed after the necessities of the global schema. But local databases might be not optimal then. Atleast they want to distributes fields into different tables but most often the field entries have a given type and format. To allow these fields to be compared in the global schema we need to convert them into a common format that is applicable to comparison operators. That is not only scaling but also mapping of enumerations and key-IDs.

The best noted example of conversion operations is the representation of numeric values by different scale base. One side might put down a measure in millimeters and the other side in inches. Or another such scaling comes with different currencies as for euros and dollars which both are given as numeric fields that can be mapped to get accessible in a query. A similar thing goes for time and date formats especially when they are given as string representations. Some process database might just store the day number and the rest of the date gets inferred.

Another topic is about mapping enumerations. That can yield quite a challanging problem if one side is not strictly a subdivision of the other side bringing about overlaps featurewise. Of course we can hope that to be not the case we just want a mapping of their field value representation which makes it again subject to a mapping function that converts the value.

One of the interesting points in this area is about ID remapping. In some cases the ID might be determined by the processing system and used to related datasets in different tables but not for anything else. In this case we need to interrogate the tables involved for a dataset that we might need to relate to the data being out to be converted and where we need to implant the ID of the related dataset.

Interesting work in this domain has been seen in the disatdis project. And there are continuations in this area as well but not yet finished. In some ways they seem to be connected to knowledge discovery functions but one would not look at that too sharp with starting out on a generalized schema mapping system. Higher order algorihmics are out of scope and we just provide the operations that can be used to get an execution for it finally. Here that would be the queries on output tables.

One thing that would be of interest to me is the question for representation of data fields in an intermediate table. That is we have a relation connected on that but the two have a different representaton. The output schema does not need the value itself so there is no predefined value type narrowed. Still the values must be comparable on a "where"-clause, so we need a common format that can be used for that and where we have a comparison operator that can work on it. And sure we do not want to get always the maximum thing.

Operations

  • foreign functions - scaling / mapping / etc
  • mapping table queries - enumerations (materialized view)
  • output table queries - key lookup / key generation
  • canonicalization for formatting - least common format detection
2004-02-04