| re | Aktuell | Seminare | Reports | Homepage | Software | ||
|
| |||||||
|
@informatik.hu-berlin.de Reports - postindustr.CC - XML/Ti Report - pTA StudienArbeit - sch_llf study - Geschichte des PC schema-mappingen ig cv hg re dv ev zz mk pr java problemsen lang swing ext gtk jjtree xul boot -grub-netboot -grub-gtk -partclone freshmeat -partimage links releaseuploader
2004-04-21
|
schema mapping - refraction operationsBeyond atomic validity of the data we find problems of data clensing like recognition of record duplications. Also we find data to be combined into fields creating a one-to-many relation of field values, and sometimes these might overlap on the other side giving many-to-many mappings of field values. We need to split up and merge data essentially, and check their validity in the target possibly rejecting entries. The examples in this area are manifold, perhaps we have a measure in one table given as value and scaling factor and the other side uses a single field. But more important are stringwise fields that we see often with combined values in one table, most commonly the adress field being collapsed or distributed with street and post code over different fields. To cut down a combined field needs some way of parsing and interpretation of the resulting pieces (kinda conversion in the second step). While being here we see also data being split or combined on a value - so what about a table carrying old data long with a field indicating "replaced by". That is seen in versioned databases quite some time and it needs to be taken into account when aggregating data into a global view or to an intermediate table. And in quite some way the latter gets us close to data clensing operations with detection of multiple records that need to collapsed by some operator. And sure we want to check the records being skipped or have a lineage on some output as to why that one was chosen and others being dropped. And also to get some lineage on those fields being parsed out from some fields - whatever that information is used for internally or even given to the output view. Operations
| ||||||