Consider the unitary matrix $A \in \mathbb{R}^{n^2\times n^2}$ which has only the first $n$ rows explicitly defined, with the remaining rows having some flexibility. $A$ can be written in block form as $ A= \left( {\begin{array}{cccc} B\\ C\\ \end{array} } \right) $ , where $B\in\mathbb{R}^{n\times n^2}$ and $C\in\mathbb{R}^{(n^2-n)\times n^2}$. The elements of $B$ are given by,
$ B_{ij}= \begin{cases} 1 & , \;\; i<n-1, \space j=i(n+1)+1 \\ 1 & , \;\; i=n-1, \space j=n^2-n \\ 0 & , \;\; \text{else} \end{cases} $
There are two restrictions on $C$, that are: 1) $C$ is the unitary complement to $B$ so $A$ is always unitary, and 2) the nonzero elements of $C$ cannot share a column with the nonzero elements of $B$.
Since this is a bit of an unusual matrix, here is an example for $n=4$. Note that $A$ is not unique because even though $B$ is fixed, there are many choices for $C$.
$ A = \left( {\begin{array}{cccccccccccccccc} 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ \hline 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ \end{array} } \right) $
Where $B$ and $C$ are separated by the horizontal line.
My question is, given the definition of $B$ is there a choice for $C$ such that $A$ can be decomposed into $\mathcal{O} (poly(log(n^2)))$ one and two qubit gates?
My attempt at a solution was to force $A$ to be a permutation matrix as in the $n=4$ example above. This would allow me to decompose my matrix using only $SWAP$ gates. However, it appears that this is not efficient in general following this SE thread.
 
    
