Think about the equal-time commutation relation between the momentum $\pi(x)$ and the coordinate $\phi(x)$: you want to write something like
$$
[\phi(\vec x),\pi(\vec y)]=i\hbar \delta^3(\vec x-\vec y).
$$
For this to make sense, everything should be distributions. That said, this is not exactly the setup of your question because the operator-valued distributions are usually understood in $\mathbb{R}^{3,1}$ and not on a time slice.
Another motivation is that thinking about distributions allows you to make sense of the states that you actually create by acting with fields on the vacuum. E.g. to see that
$$
\int d^dx f(x) \phi(x)|0\rangle
$$
is a finite-norm state, you do need to have the test function $f(x)$ and you do need to interpret $\phi(x)$ as a distribution. Without smearing with $f(x)$ you get things like $\langle0|\phi(x)^2|0\rangle=\infty$ and with $f(x)$ you need to make sense of light-cone singularities in the integral, which is where distributions save you.
Finally, people aren't making this stuff up -- there are many QFTs which have been rigorously defined (as in, mathematicians agree they exist), and their fields are operator-valued distributions (by which I mean that they verify Wightman axioms).
A bit more handwaving explanation is that generally the main point of smearing with a test function is to smear in time (in fact, it is enough to smear fields in time only to get operators). What this does is it turns things like $e^{-iEt}$, which are non-decaying at large $E$, into $\int dt f(t) e^{-iEt}$, which decays very quickly at large $E$ if $f$ is smooth. Since you have sums over $E$ everywhere, smearing makes these sums converge.