One of the divergence problem in quantum field theory stems from the use of products of distributions, which are not well defined (due to Schwartz's impossibility theorem). The simplest example of this is the Hamiltonian density of a free scalar field. Consider this :
The Hamiltonian of a free scalar field is
\begin{eqnarray}
H &=& \frac{1}{2} \int d^3x (\pi^2 + (\nabla \phi)^2 + m^2 \phi^2)\\
&=& \int \frac{d^3p}{(2\pi)^3} \frac{\omega_{\vec{p}}}{2} [a_{\vec{p}} a^\dagger_{\vec{p}} + a^\dagger_{\vec{p}} a_{\vec{p}} ]\\
&=& \int \frac{d^3p}{(2\pi)^3} \frac{\omega_{\vec{p}}}{2} [a^\dagger_{\vec{p}} a_{\vec{p}} + (2\pi)^3 \delta^{(3)}(0)]
\end{eqnarray}
This isn't terribly rigorous for a number of reasons (also we are using generalized eigenstates as a basis, complicating things further), but importantly we are performing the following product : for the operator-valued distribution $a$, we are trying to compute $\hat{a}(x) \hat{a}^\dagger(x)$, but as we saw earlier, this quantity isn't well defined (as their commutator is a delta function it's easy to show that they can't be non-singular distributions), we're essentially trying to compute the product of two delta functions, which if we try to force it by going to function sequences can be shown to be divergent :
\begin{eqnarray}
\int dx [\delta(x)]^2 &=& \lim_{\varepsilon \to 0} \int dx ({\frac {1}{\sqrt {2\pi \varepsilon }}}\mathrm {e} ^{-{\frac {x^{2}}{2\varepsilon }}})^2\\
&=& \lim_{\varepsilon \to 0} \int dx {\frac {1}{{2\pi \varepsilon }}}\mathrm {e} ^{-{\frac {x^{2}}{\varepsilon }}}\\
&=& \lim_{\varepsilon \to 0} (2\sqrt{\varepsilon})^{-1} \\
\end{eqnarray}
which is indeed divergent. The quantity is well-enough defined at two different points (from the identity $[a_p, a^\dagger_q] = (2\pi \delta^{(3)}(p - q))$), when their singular support are disjoint (the singularities are at two different points), but it ceases to be so if we try to evaluate them at the same point.
There are many physicist tricks to get around this, but the "proper" way for the product of two distributions is to use a wider space containing distributions, so-called generalized functions. The most common ones are Colombeaux algebras, which are algebras of distributions (which have been used for QFT a few times, including by Colombeau himself).
The trick with Colombeau algbras is the following : rather than considering a distribution as a linear functional on a function space, we consider equivalency classes of suitably mollified distributions (that is, distributions convoluted with some appropriate bump function). For some mollifier $\eta$, and some distribution $f$, we consider the class of functions
$$f_\varepsilon = \frac{1}{\varepsilon} \eta (-\frac{x}{\varepsilon}) * f$$
The equivalency class is defined using the set of moderate functions $\mathcal{E}_M$ (generalized functions such that for some $n \geq 0$, $f_\varepsilon = \mathcal{O}(\varepsilon^{-n})$), and the set of negligible functions $\mathcal{N}$ (for all $q > 0$ , $f_\varepsilon - f = \mathcal{O}(\varepsilon^q)$). The basic idea is that negligible functions are "infinitesimal" while moderate functions are not. The Colombeau algebra is then the equivalence class of moderate functions up to negligible functions
$$\mathcal{G} = \mathcal{E}_M / \mathcal{N}$$
In other words, for a distribution $f$, its representatives are of the form $f_\varepsilon + \mathcal{N}$.
The Colombeau algebra is strictly greater than the space of distributions, as we've seen that the product of two distributons will not necessarily be one. It can be defined as a function space on what are called generalized numbers $\tilde{\mathbb{R}}$, which are where the "real infinities" come in. we can consider the algebra of generalized functions at specific points $x$ (since unlike distributions, generalized functions are just functions from the space $\mathbb{R}^n$). Generalized numbers only form a ring (there are some elements which do not admit an inverse), and as can be expected, they are not archimedean (there are infinite and infinitesimal values).
Once you have those elements, you can perform this analysis on QFT : take the convolution of field operators with some mollifier and consider their equivalency classes. The Colombeau-Gsponer paper does this, which, after a fairly long calculation, finds that the Hamiltonian is a sum of four operator-valued generalized functions : two negligible functions, a moderate function (that is just the energy part of the Hamiltonian), and the vacuum energy term which is of the form
\begin{eqnarray}
E &=& \frac{1}{2} \int d^3p_1 d^3p_2 \hat \rho(\varepsilon p_1) \hat \rho^*(\varepsilon p_2) \exp{[i(E_{p_1} - E_{p_2}) (t - t\xi)]} \\
&& \frac{(E_{p_1} E_{p_2} + p_1 \cdot p_2 + m^2)(E_{p_1} + E_{p_2})}{4E_{p_1} E_{p_2}} |\frac{1}{\varepsilon^3} \chi (\frac{p_1 - p_2}{\varepsilon})|^2
\end{eqnarray}
while you can compute that integral, the important point is that it contains a term in $\mathcal{O}(\varepsilon^{-3})$, ie it is divergent. But this term is just a generalized number, it does not depend on any position in space. You can then redefined the Hamiltonian as simply
$$\hat{H}_R = \hat{H} - E$$
The Colombeau algebra isn't the only algebra for distributions. There are other, quite possibly more fitting algebras that can be used for it, including the algebra of distributions with the hyperreal numbers. It's quite easy to build a delta function from most of the non-archimedean fields, such as the hyperreal numbers or the asymptotic numbers, simply by picking for instance the Gaussian
$$\delta_\varepsilon(x) = \frac{1}{\sqrt{2\pi \varepsilon}} {e} ^{-{\frac {x^{2}}{2\varepsilon }}}$$
with $\varepsilon$ some infinitesimal number in the hyperreal, in which case any distribution mapped to the hyperreal (via Łoś's theorem) can become a simple function over the hyperreal via
$$f_\varepsilon = f * \delta_\varepsilon$$
The basic idea is the same as the Colombeau method, although the hyperreals and related sets have the benefit of being a field rather than a ring. More details can be found on the topic for instance in Todorov's papers.
There really is nothing magical about using "infinite" numbers for renormalization. Both the generalized numbers and hyperreal can be defined as some sequence of numbers, for instance an infinitesimal hyperreal can be written as
$$\varepsilon = (1, 1/2, 1/3, ...)$$
while an infinite number will be
$$\omega = (1, 2, 3, ...)$$
In the end, as far as the fundamental idea goes, this is the same as renormalization via the usual means : you consider the divergent quantity as a divergent sequence of finite objects from which you remove a finite quantity at each point of the sequence.