I am familiar with bare couplings diverging in perturbation theory. This question is meant to be more conceptual and is related to this answer which gives a foundational view of renormalization. The answer stresses that renormalization is a result of introducing interactions and has nothing to do with divergences. I also recently asked a related question that can be found here.
My understanding of this view of renormalization is explained in the following example. Consider the free Klein-Gordon theory: $$\mathcal{L} = \frac12(\partial_\mu\phi)^2 - \frac12 c^2\phi^2.$$ We identify the parameter $c$ to be the physical mass of the particle. When we introduce an interaction we get a new Lagrangian: $$\mathcal{L} = \frac12(\partial_\mu\phi)^2 - \frac12 d^2\phi^2 + \lambda\phi^4$$ where the coefficient of the quadratic term has changed because of the interaction, $c \neq d$, and the parameter is no longer the physical mass. Since we identify the quadratic term as contributing to a mass we demand that $m = f(d)$ for some function $f$ whose form can be found, for example, by evaluating Feynman diagrams. This allows us to determine $d$. Going from the old parameters to the new ones (i.e. going from $c$ to $d$) is how the linked answer defines renormalization. In other words, renormalization here means redefining our Lagrangian parameters because we included the interaction $\phi^4$. The fact that $d$ turns out to be divergent is a consequence of $\phi$ being a distribution, but has nothing to do with renormalization.
Assuming this is a correct view of renormalization, what confuses me is the following. Suppose we take $\lambda$ to initially be the physical coupling constant. As we know, $\lambda$ will diverge in perturbation theory and so $\lambda$ itself needs to be renormalized. But conceptually why is this the case if no further interaction was introduced?