EDIT: The main question of this post is Why do we apply the Legendre transformation with a partial derivative $\partial_\mu$ by foliating spacetime rather than with the covariant derivative $\nabla_\mu$ without foliating?
I was thinking there could be some limitations from the definition of the Legendre transformation that made it necessary and obligatory, but I couldn't find any. Unless I understood it wrong, there shouldn't be any problem taking the time covariant derivative $\nabla_0$ rather than $\partial_0$ after foliation.
I thought it might be obligatory to work on (flat) tangent bundles for the Hamiltonian mechanism to work but I weren't able to find any source that confirms or discards this.
But there is a problem with this approach: De Donder-Weyl theory. They use the full covariant derivative rather than only $\nabla_0$ or $\partial_0$:
$$p^{\mu\nu} = \frac{\partial \mathcal{L}}{\partial\left(\nabla_\mu A_\nu\right)},\tag{1}$$
$$p^{\mu} = \frac{\partial \mathcal{L}}{\partial\left(\nabla_\mu \phi\right)},\tag{2}$$
So here we don't we foliate anything, right? That's the idea of De Donder-Weyl theory, so there's no point on that. If I did it correctly, we keep using covariant derivative here with polymomenta, right? Then (1) there shouldn't be a mathematical restriction itself on $\nabla_\mu$ that makes it unfeasible to Legendre transform, or (2) we cannot Legendre transform De Donder-Weyl theory on curved spacetime.
And if that's the case and there is no problem applying the Legendre transformation using covariant derivatives as the operator, why foliate anyway? Is it just because otherwise we have $$\nabla_\sigma g_{\mu\nu} = 0\tag{3}$$ identically and thus a singular Hamiltonian in quantum gravity?