Lets say I have a collection of tensors, each tensor representing a time series with 64 points and 4 features. The dimension of each tensor would be [64,4]. I am trying to classify these series. For that I am first passing these tensors into a Transformer Encoder (having 2 attention heads and 2 encoder layers) that outputs a tensor of the same dimension. This output tensor is being flattened and passes onto a dense layer for classification. Is there some advantage of passing the time series through the encoder and classifying the encoded output over directly passing the original tensors to the dense layer.
I tried this experimentally and saw no significant increase in accuracy when using the transformer encoder. However, the data I had was quite simple and not enough to make any conclusions. Also an expert I know insists that the model with the input processed by a transformer should work better.
One thing I observed was a steeper decrease in loss when using the encoded tensors for classification.
I also referred to this resource on this matter: https://www.linkedin.com/pulse/time-series-classification-model-based-transformer-gokmen/