Since your Native Sparse Attention (NSA) paper claims before formula (7) that:
By aggregating sequential blocks of keys or values into block-level representations, we obtain compressed keys and values that capture the information of the entire block... where $$ is the block length, $$ is the sliding stride between adjacent blocks, and $$ is a learnable MLP with intra-block position encoding to map keys in a block to a single compressed key. $\tilde K^{cmp}_t ∈ R^{_×⌊\frac{−}{}⌋}$
is tensor composed by compression keys... Compressed representations capture coarser-grained higher-level semantic information and reduce computational burden of attention
Therefore $\tilde K^{cmp}_t$ is a tensor of shape $(d_k, ⌊\frac{−}{}⌋)$ formed by concatenating all the $$-compressed key vectors along the second dimension (columns). And since the number of compressed keys is given by $⌊\frac{−}{}⌋$, so as the sequence length $t$ increases, the number of blocks and hence the number of columns in $\tilde K^{cmp}_t$ also increases.