I think that using Fermi's golden rule here is going a bit too far. If you look at a standard derivation of the rule, you will see that, roughly speaking, they calculate a transition amplitude $\langle f| e^{-iHt} |i\rangle$ (equivalently, they calculate the time evolution of the initial state $|i\rangle$). Then they take the magnitude squared, $\left|\langle f| e^{-iHt} |i\rangle\right|^2$ and differentiate this with respect to time to get the transition rate. But already in taking the magnitude squared, you are throwing away your information about the full quantum state, only caring about the probabilities of measurement results in a single basis. This effectively gives you a statistical mixture.
In order to get the Schrödinger's cat scenario, you just need to make sure your Hamiltonian and time are chosen such that $e^{-iHt} |\text{no decay, cat alive}\rangle = \frac{|\text{not decayed, cat alive}\rangle + |\text{decayed, cat dead}\rangle}{\sqrt{2}}$, for example, if $U = e^{-iHt}$ is a Hadamard gate or a 90-degree rotation gate. Or you could even model the situation as two time evolutions, one $U_1$ that puts the particle in $\frac{|\text{not decayed}\rangle + |\text{decayed}\rangle}{\sqrt{2}}$, i.e. the whole system in $\frac{|\text{not decayed}\rangle + |\text{decayed}\rangle}{\sqrt{2}} \otimes |\text{alive}\rangle$, and another $U_2$ that entangles the particle with the cat, i.e. a CNOT gate "kill the cat if there was a decay" (this also necessarily resurrects the cat if it was already dead, which could be a good starting point to discuss unitarity!).
You can make sure your $e^{-iHt}$ is a rotation gate $\frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ -1 & 1 \end{pmatrix}$ if you choose the Hamiltonian $H = -\sigma_y$ (the Pauli matrix) and the duration $t = \pi/4$. Or if you want $e^{-iHt}$ to be a Hadamard gate, you can choose $H = \mathrm{HG} - 2$, a Hadamard gate minus two times the identity matrix, and $t = \pi/2$ (thanks Mathematica!).