A workshop held in conjunction with EDBT/ICDT 2013, March 22, 2013, Genoa, Italy


    The evaluation data set HUGE is available here! If you use this data set in your own research, please cite our paper on the results of the competition.
    Some pictures from the workshop are available.
    The initial results of the workshop are available.
    The program of the workshop is available.
    Accepted as half-day workshop at the EDBT/ICDE on March 22, 2013 in Genoa, Italy.


  • This competition addresses an important problem for database research and related fields, i.e., approximate string matching. Applications are many, such as duplicate detection, information extraction, error- tolerant keyword search etc.
    Participants of this workshop will compete for the most efficient implementation of scalable approximate string matching techniques. The competition comprises two tracks: similarity string search and similarity string join. The purpose is to get a clearer picture of the state-of-the-art in string matching by comparing algorithms using the same hardware and the same (large) data sets. The competition will proceed in different phases.
    1. Organizers will provide experimentation data sets representative for the later evaluation data sets, an executable specification for the runtime software stack (EC2 image) and a specification of the runtime hardware environment of the final evaluation;
    2. Participants must provide implementations of their algorithms tailored to the experimentation data and the specified runtime environment;
    3. Organizers will benchmark all submissions on previously unseen evaluation data;
    4. At the workshop, participants will present their approaches and organizers will present the results of the benchmarks.


  • Ulf Leser: leser(at)informatik.hu-berlin.de
  • Sebastian Wandelt: wandelt(at)informatik.hu-berlin.de

