|
Experimentation data sets and Input/Output
-
We supply experimentation data sets for testing and optimization of your code. You can assume that the experimentation data sets are roughly representative for the competition data sets in terms of relative frequencies of symbols, distribution of string length, etc.
In order to cover different alphabets, we use two datasets:
- Human genome read data:
The evaluation data set contains in the order of dozens of millions of reads from (different) human genomes. The size of the public experimentation dataset is 750.000 reads; which is roughly 5% of the competition dataset.
- Geographical names
The evaluation dataset contains in the order of several millions of names of cities from all over the world after phonetic rewriting. The size of the public experimentation dataset is 400.000 names; which is roughly 5% of the competition data.
General Input
For both datasets, the input, output and further constraints are defined here
|
|