Based on this feature dependencies problem.
$G_{x,y,z}$ is rank-3 tensor where:
- $x$: Num of samples.
- $y$: Num of features.
- $z$: Num of embedding dimensions.
If I train a simple attention layer architecture where the input and output to feed are identic, which is the $G_{x,y,z}$. Then, my hypothesis says the attention layer's weight is the adjacency matrix where it's what I expecting in the reference question.
This is supported by attention layer shape argument which is expecting the shape of (batch_size, Tq, Tv) atleast in Keras3. Considering the analogies:
batch_sizeis $x$Tqis $y$Tvis $z$.
The adjacency matrix must be have the shape of $y×y$.
So, is my hypothesis correct?
Which weight that the shape is $y×y$?