Gate superposition
Superposition is a complex linear combination of pure states with a physical interpretation of coefficients. One aspect of this interpretation concerns the probability of measurement outcomes on a superposition state. Gates are not pure states and generally cannot be measured. Therefore, strictly speaking, there is no such thing as a superposition of quantum gates.
Quantum computer in superposition
In order to answer the part of the question regarding the operation of a quantum computer which is itself in a superposition of two different programs, we need to adopt a model of the quantum computer. Assume that our computer includes a classical control register that encodes the parameters of a unitary gate to be performed and a quantum data register whose state lives in the Hilbert space $\mathcal{H}_d$. If at time $t_0$ the control register is in state $r$ and the data register is in state $|\psi_0\rangle$ then at a later time $t_1$ the data register is in state
$$
|\psi_1\rangle = U(r) |\psi_0\rangle.
$$
We also assume that application of quantum gates to the data register does not change the control register.
In this model, a quantum computation is performed by initializing the data register, loading values $r_0, r_1, \dots, r_n$ in order into the control register to effect the desired quantum gates and finally performing a computational basis measurement on the data register. Our model ignores the mechanism which sets the control register.
Now, let us do what the question suggested and consider the above quantum computer to be a quantum system. Thus, the state of our control register now lives in the Hilbert space $\mathcal{H}_c$ and the state space of our computer is $\mathcal{H}_c \otimes \mathcal{H}_d$.
For simplicity, let us first consider the situation where at time $t_0$ the control register is in a classical state $|r\rangle$ and the data register is in state $|\psi_0\rangle$. By the correspondence principle, at time $t_1$ the computer will be in state
$$
(I \otimes U(r))|r\rangle |\psi_0\rangle = |r\rangle |\psi_1\rangle.\tag1
$$
Note that the states $|r\rangle$ form a basis of $\mathcal{H}_c$. In the case where the control register is binary, these can be identified as the computational basis states in $\mathcal{H}_c$. Now, since these states form a basis, the equation $(1)$ provides us with the full specification of the action of the quantum computer. This operation can be written as a matrix
$$
C = \begin{pmatrix}
U(0) & 0 & 0 & \dots & 0 \\
0 & U(1) & 0 & \dots & 0 \\
0 & 0 & U(2) & \dots & 0 \\
\dots \\
0 & 0 & 0 & \dots & U(N-1)
\end{pmatrix}\tag2
$$
or more succinctly as the direct sum
$$
C = U(0) \oplus U(1) \oplus \dots \oplus U(N-1)\tag{2'}
$$
where $U(k)$ for $k=0, \dots, N-1$ are the gates that our quantum computer can natively perform (this is often called the "gateset").
Suppose now that at time $t_0$ the control register is in state
$$
|s\rangle = \alpha |r\rangle + \beta |r'\rangle
$$
for appropriate $\alpha$ and $\beta$ and that the data register is in state $|\psi_0\rangle$. By the superposition principle, at a later time $t_1$ the computer will be in state
$$
\begin{align}
C |s\rangle|\psi_0\rangle &= C (\alpha |r\rangle + \beta |r'\rangle)|\psi_0\rangle \\
&= \alpha \,C |r\rangle|\psi_0\rangle + \beta \,C |r'\rangle|\psi_0\rangle \\
&= \alpha \,(I \otimes U(r))|r\rangle|\psi_0\rangle + \beta \,(I \otimes U(r'))|r'\rangle|\psi_0\rangle \\
&= \alpha \,|r\rangle|\psi_1\rangle + \beta\,|r'\rangle|\psi_1'\rangle
\end{align}
$$
where $|\psi_1'\rangle = U(r')|\psi_0\rangle$.
This operation can be viewed as a quantum analogue of the classical switch statement that selects a unitary to be applied on the data register based on the contents of the control register. It is sometimes called the quantum multiplexor (see e.g. here).
However, while the direct sum of the unitary gates and the entangled state of the control and data registers explains what happens in a quantum computer that is in a superposition of two different programs, direct sum of two gates lacks many properties familiar from state superposition. In particular, direct sum of unitary gates has no amplitudes. Also, the superposition of the full quantum computer turns into a mixture when the control register is traced out and so - as pointed out in the comments - the data register itself is not in superposition. Thus, the model above does not define a notion one could call "gate superposition". This is expected as unlike states, unitary evolutions do not superpose. It is also informative to see how other approaches to defining such a notion fail.
Naive linear algebra
Naively forming a "superposition" of two unitary gates $U$ and $V$ by taking
$$
M = \alpha U + \beta V
$$
for $\alpha, \beta \in \mathbb{C}$ such that $|\alpha|^2 + |\beta|^2 = 1$ does not generally result in $M$ unitary. This is a consequence of the fact that the set of unitary operators is not a complex vector space. Similarly to how one can normalize non-zero ket to unit norm, one can also extract the unitary part of a non-singular matrix using the polar decomposition
$$
M = WP
$$
where $P$ is positive and $W$ unitary. However, this lacks the physical meaning associated with superposition.
State-channel duality
One could also seek a way to define gate "superposition" by looking at the superposition of the corresponding states under state-channel duality. Let $|i\rangle$ for $i=1, \dots, d$ denote an orthonormal basis of a $d$-dimensional Hilbert space $\mathcal{H}$ and let $|\psi\rangle = \frac{1}{\sqrt{d}}\sum_{i=1}^d |i\rangle|i\rangle$ be a maximally entangled state. Denote by $U$ and $V$ two unitary gates on $\mathcal{H}$. The corresponding states are
$$
\rho_U = (I \otimes \mathcal{U})(|\psi\rangle\langle\psi|) = \frac{1}{d}\sum_{i,j=1}^d |i\rangle\langle j| \otimes U|i\rangle\langle j|U^\dagger \\
\rho_V = (I \otimes \mathcal{V})(|\psi\rangle\langle\psi|) = \frac{1}{d}\sum_{i,j=1}^d |i\rangle\langle j| \otimes V|i\rangle\langle j|V^\dagger.
$$
Note that $\rho_U$ and $\rho_V$ are pure. Their superposition with amplitudes $\alpha$ and $\beta$ is
$$
\begin{align}
|\psi_M\rangle &= \alpha \left(\frac{1}{\sqrt{d}}\sum_{i=1}^d |i\rangle \otimes U|i\rangle\right) + \beta \left(\frac{1}{\sqrt{d}}\sum_{i=1}^d |i\rangle \otimes V|i\rangle \right) \\
&= \frac{1}{\sqrt{d}}\sum_{i=1}^d |i\rangle \otimes (\alpha U + \beta V)|i\rangle \\
&= \frac{1}{\sqrt{d}}\sum_{i=1}^d |i\rangle \otimes M|i\rangle \\
&= (I \otimes M) |\psi\rangle
\end{align}
$$
where as before $M = \alpha U + \beta V$. Note that $|\psi_M\rangle$ is not normalized. In fact, when $M$ is singular, it may be impossible to normalize it. However, even when $|\psi_M\rangle$ can be normalized, it is not necessarily a maximally entangled state. Consequently, it does not map back to a unitary operation via state-channel duality.