zz Aktuell Seminare Reports Homepage Software
printer / text mode version
university-logo
draheim
@informatik.hu-berlin.de

Reports
- postindustr.CC
- XML/Ti Report
- pTA StudienArbeit
- sch_llf study
- Geschichte des PC
 
schema-mappingen
  ig cv hg re dv ev
  zz mk pr
java problemsen
  lang swing ext gtk jjtree xul
 
boot
-grub-netboot
-grub-gtk
-partclone freshmeat
-partimage links
 
releaseuploader
 
 
 

sitemap             *offsite link

2004-04-21
(C) Guido Draheim
guidod@gmx.de

 
generated by mksite.sh
2005-06-20

schema mapping - summary

When looking over the operations involved we see the parts that would need to be provisioned for in an application framework supporting generation of schema-mapping applications. We need to have atleast loading of sql snippets (with additional extensions and annotations) and foreign function interfaces (for the generator and generated applications). And better not forget a set of tree operations, the handling of intermediate tables and cursors, plus support for secondary parts like logging services. Some interactivity would be fine even that it wouldn't be potterswheel.

so far we have seen this (in order of prior mentioning:)

  • wrapper generation --> sql parse and write
  • rule rewriting --> sql ast and datalog twin
  • transfer encoding / decoding --> foreign functions / odbc
  • scaling / mapping --> foreign functions / generated (ffi@script)
  • mapping table queries --> intermediated tables and ffi@gen
  • output table queries --> sql write and lang generation
  • canonical formatting --> head match / sql ast / ffi@script
  • structure discovery --> sql write and parse
  • meta variable expansion --> sql write / intermediate tables
  • meta variable substition --> sql ast
  • query subdivision --> sql ast / sql write / ffi@script
  • regular expression split --> lang generation
  • vocabulary lookup --> ffi@script / sql query write
  • data clensing --> ffi@script / intermediate tables
  • log file managment --> ffi@script / intermediate tables (kept)
  • table ordering --> intermediate tables / sql ast
  • record aggregation --> lang generation / intermediate tables

Infrastructure

A framework to support them would need a given set of operations: that would be of course handling of sql snippets and their derivates. We want to parse them, inspect an intermediate AST, and generate it to be sent to down to databases. And we need to embed sql snippets into the generated application. As we know there are a lot of problems about bringing in optimizations that work efficient even on large datasets as we see them in bioinformatics. It is not good to crossjoin everything to the value set we want in the end.

A second part is about foreign functions both while inspecting the AST and in the generated application. In some parts they can be degenerated to mere init scripts with init settings for internal algorithms and I expect a large part to be in that way. That is also for maintainiblity since an equal-test should be done in the server or using normal sql equality test or script language equality test - and in none of the instances a call to "equal(...)". In a way we want to have generator rules that are triggered by name (aka rollout "equal") and make the intended representation of the function. It would be just called foreign since we like to have the compiler to be agnostic about the characteristics of the function other than those information provided by the foreign function interface giving it by retroinspection.

The creation of intermediate variable list and intermediate relation tables seems to be important. In some parts they are a prerequisite for some operation and therefore define an ordering the execution of operations. The correct time for creation and destruction seems to be important just as caching when multiple partial operations would need it.

And last not least we need to check out about use interaction and user intervention. Since we deal with a lot of heterogenities in the system we can hardly assert that all relevant input data can be mapped into the output schema. Here we need not only log handling for coroner work but better have log handlers that can be initialized and try to recover some parts whereas not overcomplicate the main path. Here we have to deal with init scripts a lot and their influence on the system execution. A graphical user interface would yield a lot in efficiency I expect.

Series One

The implementation of any of these is a lot of work. They can hardly be done in a few months. That is not necessary if a framework is being built that allows to plug the actual algorithms into their place. To test its implementation we need a startout functionality we want to see working in the series one of the compiler environment. After some discussion it seems that schematic heterogenity is a good one as it covers large parts of the system - even though it touches foreign functions not to a large degree.

In schematic heterogenity we need inspection of global sql scripts and sql database wrappers, we need to materialize structure tables and rollout the ast tree. Much of this has been already described in implementations about schemaSQL which minimized errors in design choices. Instead the task would be to see the various steps of some of the algorithms being represented pluggable to the framework to have the actual generated applications being augmented later with indepth parts and third party optimizations, even onsite ones from user intervention for efficiency.

That in effect requires a specification language that guides the actual processing as well as shows where to represent extensions of attributions about sql snippets and foreign functions.

2004-02-05