As mentioned in the comment above there is a relevant section of Nielsen and Chuang that discusses circuits which perform this sort of action (p191-193, 10th anniversary edition). The desired transformation can be done using a classical reversible circuit with the $\{X, C_nX\}$ gate set.
I'm going to show how to implement such permutation circuits in pytket using the ToffoliBox construction. This implementation uses multiplexor operations to reduce the number of multi-qubit gates required.
Regarding the complexity of these perumtations, I expect the number of gates required to scale exponentially in the worst case but I'm unsure how to prove this. The complexity of permutation circuits has also been dicussed here.
Here is an example of how to use ToffoliBox taken from the pytket user manual.
Let's perform the following permutation of basis states
\begin{gather}
    |001\rangle \longmapsto |111\rangle \\
    |111\rangle \longmapsto |001\rangle \\
    |100\rangle \longmapsto |000\rangle \\
    |000\rangle \longmapsto |100\rangle
\end{gather}
We can construct this permutation as a dictionary of key-value pairs.
from pytket.circuit import ToffoliBox
Specify the desired permutation of the basis states
mapping = {
    (0, 0, 1): (1, 1, 1),
    (1, 1, 1): (0, 0, 1),
    (1, 0, 0): (0, 0, 0),
    (0, 0, 0): (1, 0, 0),
}
Define box to perform the permutation
perm_box = ToffoliBox(permutation=mapping)
For correctness if a basis state appears as a key in the dictionary then it must also appear as a value.
Let's now test this out by preparing the three qubit $W$ state and see if our ToffoliBox shuffles our basis states as expected.
\begin{equation}
|W\rangle = \frac{1}{\sqrt{3}} \big(|001\rangle + |010\rangle + |100\rangle \big)
\end{equation}
from pytket.circuit import Circuit, StatePreparationBox
import numpy as np
w_state = 1 / np.sqrt(3) * np.array([0, 1, 1, 0, 1, 0, 0, 0])
w_state_box = StatePreparationBox(w_state)
state_circ = Circuit(3)
state_circ.add_gate(w_state_box, [0, 1, 2])
This is not the most efficent way to prepare $|W\rangle$ but will serve for illustration purposes.
# Verify state preperation
np.round(state_circ.get_statevector().real, 3) # 1/sqrt(3) approx 0.577
As expected we get
array([-0.   ,  0.577,  0.577,  0.   ,  0.577,  0.   ,  0.   ,  0.   ])
Now lets add our ToffoliBox which we defined in the first code block to our state preparation circuit and check our statevector again

state_circ.add_gate(perm_box, [0, 1, 2])
np.round(state_circ.get_statevector().real, 3)
After the permutation box is applied we now get...
array([0.577, 0.   , 0.577, 0.   , 0.   , 0.   , 0.   , 0.577])
corresponding to the modified state $|W'\rangle$
\begin{equation}
|W'\rangle = \frac{1}{\sqrt{3}} \big(|000\rangle + |010\rangle + |111\rangle \big)
\end{equation}
We see that our ToffoliBox has performed the desired permutation.