Time Series Classification using Transformer Encoder

Question

Lets say I have a collection of tensors, each tensor representing a time series with 64 points and 4 features. The dimension of each tensor would be [64,4]. I am trying to classify these series. For that I am first passing these tensors into a Transformer Encoder (having 2 attention heads and 2 encoder layers) that outputs a tensor of the same dimension. This output tensor is being flattened and passes onto a dense layer for classification. Is there some advantage of passing the time series through the encoder and classifying the encoded output over directly passing the original tensors to the dense layer.

I tried this experimentally and saw no significant increase in accuracy when using the transformer encoder. However, the data I had was quite simple and not enough to make any conclusions. Also an expert I know insists that the model with the input processed by a transformer should work better.

One thing I observed was a steeper decrease in loss when using the encoded tensors for classification.

I also referred to this resource on this matter: https://www.linkedin.com/pulse/time-series-classification-model-based-transformer-gokmen/

score 3 · Answer 1 · answered Oct 10 '23 at 00:32

If you know for a fact that you will always have 64 points in the time series, then there is no advantage of using a transformer over a MLP.

The advantage of the transformer is it can batch process variable length sequences. For example if the input dimension was (n,4) where n is variable, you could use a transformer encoder, pool the inputs to a fixed length embedding, then send the embedding to a classification MLP.

Time Series Classification using Transformer Encoder

1 Answers1