This page was built in support of our paper *"SFA: A Symbolic Fourier Approximation and Index for Similarity Search in Time Series Data"* by Patrick Schäfer and Mikael Högqvist.

This page contains further experiments, which we couldn't show in the paper using varying time series lengths.

Scalable Time Series Data Analytics using SFA, BOSS and BOSS VS

SFA has a very tight lower bound to DFT. This is shown in this experiment for 256 symbols of the symbolic representations.

Coefficients | 8 | 16 |
---|---|---|

Length 256 | ||

Length 1024 | ||

Length 2048 |

Here is the raw data.

We applied a one-tailed Wilcoxon signed-rank test using the statistics tool R to test if any method has a higher tlb. The null-hypothesis is that the tlb of PAA is higher than that of DFT.

length | 2048 | 2048 | 1024 | 1024 | 256 | 256 |

symbols | 256 | 256 | 256 | 256 | 256 | 256 |

coef. | 16 | 8 | 16 | 8 | 16 | 8 |

N | 28 | 28 | 28 | 28 | 29 | 29 |

W+ | 379.5 | 385 | 390.5 | 386.5 | 415.5 | 404 |

p-value | 3.063e-5 | 1.789e-5 | 1.029e-5 | 1.54e-5 | 9.743e-6 | 2.486e-6 |

With a max p-value of 3.063e-5 the null-hypothesis can be rejected on all tested configurations. This means, the alternative hypothesis that the tlb of DFT if higher than the tlb of PAA is statistically significant.

If the time series are z-normalised and N PAA-coefficients are calculated, it is sufficient to store only N-1 and reconstruct the N-th coefficient. Thus, we tested **PAA2** with **N+1** coefficients while DFT uses only N. The null-Hypothesis is that the tlb of PAA2 is higher than the tlb of DFT.

Length | 2048 | 2048 | 1024 | 1024 | 256 | 256 |

Symbols | 256 | 256 | 256 | 256 | 256 | 256 |

Coefficients | 16(17) | 8(9) | 16(17) | 8(9) | 16(17) | 8(9) |

N | 28 | 28 | 28 | 28 | 29 | 29 |

W_{+} | 351 | 309 | 376 | 338 | 404.5 | 407 |

p-value | 0.0003913 | 0.002043 | 7.57e-06 | 0.001096 | 2.75e-05 | 2.181e-05 |

With a max p-value of 0.002043 the null-hypothesis can be rejected on all tested configurations. This means, the alternative hypothesis tlb of DFT is higher than tlb of PAA is statistically significant.

Here is the raw data for the Wilcoxon signed-rank test.

We compare SFA and iSAX in terms of pruning power, which is a measure for the performance of 1-nearest-neighbour queries. The pruning power P is defined as the fraction of the database, that must be examined in reduced space before the 1-nearest-neighbour to a query is found.

It can be seen as the number of false alarms that have to be inspected until the true nearest neighbour is found (lower is better). To measure P, we first determine the true 1-nearest-neighbour TNN to a query Q, sort the time series in reduced space by increasing distance to Q and count the number of time series that have a smaller distance than TNN.

For completness, we show further results for lengths 256, 512 and 1024 on this page for the TS-tree and the kd-tree, which show the same trend, that the SFA trie is competitive or superior to other indexing techniques.

time series length | page accesses | time | distance calculations |
---|---|---|---|

1024 | |||

512 | |||

256 |

In the paper we could only show the number of page accesses and wall clock time. For completness, we show the number of distance calculations on this page. It shows that the SFA trie does not only have less page accesses but also requires less distance calculations for exact similarity search.

time series length | page accesses | time | distance calculations |
---|---|---|---|

1024 |

Here is the raw data for the experiments.

- Thank you to the providers of the time series datasets.
- Thank you to Eamonn J. Keogh for collecting all time series datasets and providing this page on iSAX.