I think this should not be much easier than classical logic synthesis.  The complexity of (not necessarily reversible) logic synthesis is known to be $\sum_2^p$-complete, or likely even harder than being NP-complete assuming the Polynomial Hierarchy doesn't collapse.  Although there are many great heuristics that drive much of VLSI- and ASIC-design nowadays.
In particular, such a permutation can be thought of as a truth-table with $2^n$ rows, each with $n$ input bits and $n$ output bits. We can ignore the first $n-1$ output bits and consider just circuit optimization on the last bit, given the $2^n\times (n+1)$ truth-table. We can use our favorite logic synthesis technique such as Karnaugh maps or the Quine-McCluskey algorithm, etc., to optimize over the right set of gates.
It may be slightly easier, because we know that the last output bit is $0$ half of the time and $1$ half of the time (while Karnaugh maps and the QM algorithm are indifferent to this). But I suspect it's still likely $\sum_2^p$-hard to find the best circuit reversible circuit, even under the promise that the output bit of the Boolean formula is $0$ half of the time.
If reversible logic synthesis were computationally much easier than irreversible logic synthesis then it might make sense for even classical circuits to implement them.  But Googling "logic synthesis reversible circuits" generates a lot of hits.