jjtree Aktuell Seminare Reports Homepage Software
printer / text mode version
university-logo
draheim
@informatik.hu-berlin.de

Reports
- postindustr.CC
- XML/Ti Report
- pTA StudienArbeit  .
- sch_llf study
- Geschichte des PC

TechDocs
- Perl Objects
- Installing Oracle
- shell cmds in python
- Using css for xml
    defs   tricks
- Unsafe mono  [x]  !
- Docbook Manpages
- Java Bean   Code
rpm-suse
 
- schema-mappingen
  ig cv hg re dv ev
  zz mk pr
- java problemsen
  lang swing ext gtk jjtree xul
 
boot
-grub-netboot
-grub-gtk
-partclone freshmeat
-partimage links
 
-releaseuploader


sitemap


-guidod-pygtk
sitemap             *offsite link

2004-04-21
(C) Guido Draheim
guidod@gmx.de

 
generated by mksite.sh

java - javacc/jjtree problems

Generations of computer scientists have been taught the syntax descriptions in the form of backus-naur - and most descriptions of problem-oriented languages in literatur are given in an appendix using extended backus naur form, known widely under its shorthand EBNF. So most people know it and have an idea how to write down their own extensions for a given language specifications. Just add a keyword in the apropriate place referencing other meta designations.

Of course the ENBF variant is just one way to express a language specification that can be used to write a parser. There is a complete theory about context-free grammars and ways to construct a machine parser from a given specification. The yacc tool is one of the oldest and widely used of such tools that can transform an EBNF into a machine parser, in here just the executions after the rule are given in C notation.

The yacc tool is capable of LR parsers which can be very complex on the generated state machine recognizing the input language. The javacc people however did not bend to the traditional format of expressing syntax specifications. Instead they wanted to have something more tightely coupled with the java language itself. In this combination, the initial javacc parser language is just an LL parser but one can do quite some tricks with java variables and modes making it really an LL(k) parser language.

This LL(k) variant of expressing a grammar is capable of having a machine that can parse all of the existing problem-oriented languages around. The only problem is in the tricks and modes that might need to be pushed into the javacc specification. And that in return makes it hard to be maintained by third-parties since them know EBNF but not the peculiar format of javacc. It is even hard to add javacc executions in the right places that will build the output portion of syntax tree from the input.

Atleast for the latter we are given some help. The jjtree is an add-on for javacc that adds java executions in the right places to generate an AST (abstract syntax tree) of the input. The resulting jjtree specification looks a lot closer to traditional syntax specification grammars with named heads and alternative sub-heads and lex tokens. No need to know how to access the syntax variables and stuff.

At this point I was very much hoping for javacc - I had a need to parse SQL input text which is quite a thick language with lots and lots of constructions. Additionally I had heard that Oracle had donated a javacc specification of its sql dialect, so I was really hoping to find a decent parser already made and in a state that one would just start to modify it to fit my needs. Perhaps someone had even made a variant for postgresql or some other sql variant.

However, my hopes did not fulfill. It did turn out that there was not much to be found, the three javacc references were derived from the Oracle variant which in turn was not specifically for the Oracle sql scripts but for some embedded form subset, a.k.a. formsSQL. The derived variants were cutdown, or in the other extension just added with some special keywords - there was no SQL javacc spec around extended to cover over SQL dialects or recognize more syntax constructs from SQL - be them from Oracle or somewhere else.

And most annoying of all, there was no jjtree specification. It may just as well be the reason why there are not much of a javacc SQL specification. As noted above, the average programmer will not have a clue how to modify a javacc specification which is spiced up with special parser executions. So people will avoid to use the javacc SQL specification.

Instead I found people writing up their own SQL specification in ANTLR - or in the other direction to port a yacc parser to run in java and generate a java state machine. Unlike with Unix/C the java community is pretty much fractioned in how to write down a grammar spec. You do even see books that give a traditional EBNF in the appendix and the actual spec of the parser they have used.

After all, I came to pick up the oracle sql javacc spec from somewhere then cleaning out all superflous stuff and adding the needed three liner to make it a jjtree specification. Well, at this point I did already read up all about javacc to be confident enough to know what to do to get at a correct result. More intesting however is the fact that I was able to add and reorder rules to allow to not only parse my little postgresql script - but also to generate a jjtree that looks good.

It looks good because at some places I was killing nested rules that would just be superflous when going to walk the resulting AST for doing actual work. I did check that with an execution head that would dump the tree in xml format. The xml format in turn can be read by many tools that will transform trees - including abstract syntax trees.

In the end, I can benefit from javacc generating a lightweight parser for the SQL language and still I am able to modify the parser specification easily thereby being able to asssert that I know where the parsed data will pop up in the created abstract syntax tree in main memory, or the xml tree made from it.

Anyway: here it is:

  • SqlScript.jjt - SQL jjtree parser
  • SqlToXml.java - sql2xml AST program