Humboldt-Universität zu Berlin, Institut für Informatik

Referential compression of biological sequences

On this page you can find an implementation of a referential compression algorithm for biological sequences The idea is to split up a given reference genome into blocks. In general, two genomes of the same species are very similar to each other, and blocks are chosen in a way that long matches of to-be-compressed blocks can often be found in reference blocks by local search.

Source code

Compilation instructions

The code was successfully compiled on a Fedora 17 (using gcc 4.7.0-5). It contains a build/project file for Codeblocks. You need the following libraries installed:


Sebastian Wandelt
Created 08/27/2012