The usual derivation of energy stored in a capacitor is as follows $$dU=Vdq\\dU=\frac QCdq$$$$U=\frac12\frac {Q^2}C\equiv\frac12QV\tag1$$ Where $V$ is the final potential. Explicitly $$V=-\int\vec E\cdot d\vec l\tag2$$ Where $\vec E$ is the net electric field (that is, this field has contributions from both plates) between the plates.
From a mathematical point of view, it makes perfect sense why a factor of $1/2$ is coming.
From a physical point of view, we can justify this factor for the parallel plate capacitor as follows.
Firstly we will distribute our system of capacitor into two sub-systems, one being the positive plate and another being the negative plate. It is important to realize that we are only interested in the interaction energy of these two systems, usually called "energy stored by the capacitor" which is given in $(1)$. Hence we will not give any regard to the way these plates were themselves created (which does not matter anyways due to the conservative nature of electrostatics) and to the energy needed to create these plates (which is infinite anyways due to infinite charge on each plate in our approximation used). Thus taking this into account we will create our capacitor as follows: Take two plates with opposite polarity very close to each other so that the net electric field is zero everywhere. Now hold the negative plate and pull the positive plate away to some distance $l$. This procedure is used specifically to simplify the calculations done below.
Consider a charge of $dq$ on the positive plate. The energy needed to bring this to $l$ is $$\Delta\phi dq\tag3$$ where $\phi$ is the potential due to the negative plate. The difference in $\phi$ is calculated at $0$ and $l$. Since we are only talking about interaction energy, the net energy needed to bring the plates close is simply $$\int\Delta\phi dq=Q\Delta\phi\tag4$$ since $\Delta\phi$ is a constant and $Q$ is $\int dq$.
Calculation of $\Delta\phi$:
The electric field due to the negative plate is $$E_n=-\frac\sigma2\tag5$$
here $\epsilon_0=1$ is used. The potential due to this, $\Delta\phi$ is $$\Delta\phi=-\int E_ndl=\frac\sigma2 l\tag6$$ Thus the energy becomes $$U=\frac\sigma2 Ql=\frac {Q^2}2\frac Al =\frac12QV\tag7$$ Where the definition of capacitance is used and $V$ is defined as in $(2)$.
How to generalize this kind of "see-through" reasoning in terms of charge distributions and electric fields as opposed to direct definitions of capacitance, to any type of capacitor?