SFA

# SFA - Symbolic Fourier Approximation

This page was built in support of our paper "SFA: A Symbolic Fourier Approximation and Index for Similarity Search in Time Series Data" by Patrick Schäfer and Mikael Högqvist.

This page contains further experiments, which we couldn't show in the paper using varying time series lengths.

## Java Implemenation of SFA

Scalable Time Series Data Analytics using SFA, BOSS and BOSS VS

## Tightness of Lower Bounds for Dimensionality Reduction Techniques

SFA has a very tight lower bound to DFT. This is shown in this experiment for 256 symbols of the symbolic representations.

Here is the raw data.

## Wilcoxon Signed-Rank Test based on Tightness of Lower Bounds

We applied a one-tailed Wilcoxon signed-rank test using the statistics tool R to test if any method has a higher tlb. The null-hypothesis is that the tlb of PAA is higher than that of DFT.

 length 2048 2048 1024 1024 256 256 symbols 256 256 256 256 256 256 coef. 16 8 16 8 16 8 N 28 28 28 28 29 29 W+ 379.5 385 390.5 386.5 415.5 404 p-value 3.063e-05 1.789e-05 1.029e-05 1.54e-05 9.743e-06 2.486e-06

With a max p-value of 3.063e-5 the null-hypothesis can be rejected on all tested configurations. This means, the alternative hypothesis that the tlb of DFT if higher than the tlb of PAA is statistically significant.

If the time series are z-normalised and N PAA-coefficients are calculated, it is sufficient to store only N-1 and reconstruct the N-th coefficient. Thus, we tested PAA2 with N+1 coefficients while DFT uses only N. The null-Hypothesis is that the tlb of PAA2 is higher than the tlb of DFT.

 Length 2048 2048 1024 1024 256 256 Symbols 256 256 256 256 256 256 Coefficients 16(17) 8(9) 16(17) 8(9) 16(17) 8(9) N 28 28 28 28 29 29 W_{+} 351 309 376 338 404.5 407 p-value 0.0003913 0.002043 7.57e-06 0.001096 2.75e-05 2.181e-05

With a max p-value of 0.002043 the null-hypothesis can be rejected on all tested configurations. This means, the alternative hypothesis tlb of DFT is higher than tlb of PAA is statistically significant.

Here is the raw data for the Wilcoxon signed-rank test.

## Pruning Power Experiments

We compare SFA and iSAX in terms of pruning power, which is a measure for the performance of 1-nearest-neighbour queries. The pruning power P is defined as the fraction of the database, that must be examined in reduced space before the 1-nearest-neighbour to a query is found.

It can be seen as the number of false alarms that have to be inspected until the true nearest neighbour is found (lower is better). To measure P, we first determine the true 1-nearest-neighbour TNN to a query Q, sort the time series in reduced space by increasing distance to Q and count the number of time series that have a smaller distance than TNN.

## Exact Similarity Search - Performance

For completness, we show further results for lengths 256, 512 and 1024 on this page for the TS-tree and the kd-tree, which show the same trend, that the SFA trie is competitive or superior to other indexing techniques.

## Exact Similarity Search - iSAX index vs. SFA trie

In the paper we could only show the number of page accesses and wall clock time. For completness, we show the number of distance calculations on this page. It shows that the SFA trie does not only have less page accesses but also requires less distance calculations for exact similarity search.

time series length page accesses time distance calculations
1024   Here is the raw data for the experiments.

## Acknowledgements:

• Thank you to the providers of the time series datasets.
• Thank you to Eamonn J. Keogh for collecting all time series datasets and providing this page on iSAX.