Are the Hamiltonian and Lagrangian always convex functions?

Question

The Hamiltonian and Lagrangian are related by a Legendre transform: $$ H(\mathbf{q}, \mathbf{p}, t) = \sum_i \dot q_i p_i - \mathcal{L}(\mathbf{q}, \mathbf{\dot q}, t). $$ For this to be a Legendre transform, $H$ must be convex in each $p_i$ and $\mathcal{L}$ must be convex in each $\dot q_i$.

Of course this is the case for simple examples such as a particle in a potential well, or a relativistic particle moving inertially. However, it isn't obvious to me that it will always be the case for an arbitrary multi-component system using some complicated set of generalised coordinates.

Is this always the case? If so, is there some physical argument from which it can be shown? Or alternatively, are there cases where these convexity constraints don't hold, and if so what happens then?

score 21 · Answer 1 · edited Oct 12 '24 at 10:54

I) At the classical level, there is no convexity condition. If an action functional $S$ yields a stationary action principle, so will the negative action $-S$. (Under sign changes, a convex function turns into a concave function and vice versa.) Or one could imagine a theory which is convex in one section and concave in another sector.

II) On the Lagrangian side $L(q,v,t)$, it is easy to find a counterexample that shows that one cannot demand convexity in the position variables $q^i$; or the time variable $t$, for that matter. (For the former, think, e.g., on a Mexican hat potential.) So, as OP writes, the convexity can, at most, concern the velocity variables $v^i$ in the Lagrangian or the momentum variables $p_i$ in the Hamiltonian $H(q,p,t)$.

III) In the Hamiltonian formulation, it is possible to perform the canonical transformation

$$(q^i,p^j)~\longrightarrow~(Q^i,P^j)~=~(-p^i,q^j),$$

which mixes position and momentum variables. From a Hamiltonian perspective, it is unnatural to impose convexity on half the canonical variables but not the other half.

IV) The Lagrangian (density) may be modified with total divergence terms that don't change the Euler-Lagrange equations. These total divergence terms could, in principle, violate convexity.

V) The Legendre transformation could be singular. In fact, this is the starting point of constraint dynamics. This happens e.g. for the Maxwell Lagrangian density $${\cal L}~=~-\frac{1}{4}F_{\mu\nu}F^{\mu\nu}.$$ See e.g. this Phys.SE post.

VI) Quantum mechanically, we must demand that the Hamiltonian operator is self-adjoint and bounded from below, i.e. the theory should be unitarity.

Perturbatively, this means that the free/quadratic kinetic term should be a (semi)positive form (and therefore a convex function). Zero-modes should be gauge-fixed. Interaction terms are usually treated perturbatively.

In conclusion, convexity does not seem to be a first principle per se but, rather, a consequence of the type of QFTs that we typically are able to make sense of. It might be that it is possible to give a non-perturbative definition of a non-convex (but unitary) theory.

score 14 · Answer 2 · edited Dec 05 '24 at 11:12

There are indeed nonconvex Lagrangians, and they are a problem for the Legendre transformation by making it multiple-valued (indeed, a convex but not strictly convex Lagrangian will throw up this problem). If one cannot get rid of this multiple valuedness by splitting the problem into convex and concave "sectors" (piecewise analysis where the domain of interest is restricted, as in the first paragraph of Qmechanics's answer) or by imposing constraints, then it's game over for the Hamiltonian approach. Indeed, the multiple valuedness of the Legendre transformation implies the nonuniqueness of the solution for the Euler-Lagrange equation. I'd like to talk about a famous convex, but not strictly convex example from my own field and what people do about the problems it raises. It provides an interesting illustration of difficulties with convexity and how they arise (i.e., as noted by the OP, they are tantamount to multiple valuedness of Legendre transformation) and, also, there are at least two common solutions to this particular problem whose appropriateness is different for different fields of physics! Your solution depends on what you want to achieve with your Hamiltonian.

This is the calculation of geodesics in a (semi) Riemannian manifold so that:

$$\mathcal{L} = \sqrt{g(X)(\dot{X},\,\dot{X})} = \sqrt{g_{ij}(X)\,\dot{x}^i\,\dot{x}^j}\tag{1}$$

This is also the same problem as the calculation of rays from Fermat's least time principle if you include optical density (refractive index) information in the metric tensor. Thus, ray optics in an isotropic medium is the geometry of a conformally flat manifold (since $g$'s matrix in Cartesian coordinates is the squared refractive index times the identity); anisotropic mediums give more general geometry.

This example is historically important, not only for General Relativity but also because ray optics - this very problem - was the field that piqued Hamilton's interest in these matters.

The Bad and The Ugly

The Lagrangian is convex but not strictly so. Consider the linear path:

$$\sigma(t) = t\,\dot{X}_0\tag{2}$$

in the tangent space i.e. where one moves by scaling the tangent vector $\dot{X}_0$. In response, the Lagrangian in $(1)$ also scales linearly, so the linear path lies exactly within the graph / on the edge of the epigraph of the Lagrangian. Thus, the conjugate momentum $\partial_\dot{X}\,\mathcal{L}$, being the one-form:

$$P(\text{_}) = \frac{g(\dot{X},\,\text{_})}{\sqrt{g(\dot{X},\,\dot{X})}}\tag{3}$$

is independent of $t$ as our point moves according to $(2)$. Thus, $P$ is a highly many-to-one function on the tangent space at any point: any point in the tangent space of the form $t\,X_0$ for $t\in\mathbb{R}$ has the same value in $(3)$. The Legendre transformation fails to pick out a unique $P$ for each $\dot{X}$. Not surprisingly, therefore, if one does the Legendre transformation, one gets:

$$\mathcal{H} = P(\dot{X}) - \mathcal{L}=\frac{g(\dot{X},\,\dot{X})}{\sqrt{g(\dot{X},\,\dot{X})}}-\mathcal{L}=0\quad\quad \text{OMG!!}\tag{4}$$

The Legendre transformation, in this case, is clearly many-to-one (try saying that aloud with a straight face).

Let's look at this another way. Even the solution of the Euler-Lagrange equation for the Lagrangian - quite aside from wishing for a Hamiltonian counterpart - in this problem is fraught (but it can be done with some care). The Hessian matrix of the mapping $\dot{X} \mapsto P = \partial_{\dot{X}} \mathcal{L}$ is:

$$h_{ij} = \partial_{\dot{x}^i} \partial_{\dot{x}^j} \mathcal{L} = \left(g(\dot{X},\,\dot{X})\,g_{ij} - g_{i k}\,\dot{x}^k\,g_{j \ell}\,\dot{x}^\ell\right)\,\mathcal{L}^{-3}.\tag{5}$$

For our purposes, $(5)$ is more transparent if we put it in matrix notation (here $G$ is the matrix of the metric tensor):

$$H = \frac{G}{(X^T\,G\,X)^\frac{1}{2}}\,\left(\mathrm{id} - \frac{\dot{X}\,\dot{X}^T\,G}{\dot{X}^T\,G\,\dot{X}}\right).\tag{6}$$

The rightmost term in the brackets $\dot{X}\,\dot{X}^T\,G/(\dot{X}^T\,G\,\dot{X})$ is recognized to be the projector onto the unit length vector parallel to $\dot{X}$, so the Hessian matrix is singular on every tangent space to the configuration space with kernel given by the line $\{t\,\dot{X}:\,t\in\mathbb{R}\}$. A scale multiple of the Hessian matrix is the coefficient of $\ddot{X}$ in the Euler-Lagrange equation, showing that the Euler-Lagrange equation has a whole family of solutions. Lastly, we can look at the action integral itself and what happens to it when we scale the path parameter $\tau$. Suppose the action is computed over the interval $\tau\in[0,\,1]$, and we introduce a transformation $\tau=\zeta({\sigma})$ where $\sigma$ is any smooth function with $\zeta(0)=0;\,\zeta(1) = 1$ and write $Y({\sigma}) = X(\zeta({\sigma}))$, $\dot{Y}({\sigma}) = \mathrm{d}_\sigma X(\zeta({\sigma}))$ then:

$$\mathcal{S} = \int_{\tau=0}^1\,\sqrt{g(\dot{X}(\tau),\,\dot{X}(\tau))}\,\mathrm{d}\tau = \int_{\sigma=0}^1\,\sqrt{g\left(\frac{\dot{Y}(\sigma)}{\frac{\mathrm{d}\zeta}{\mathrm{d}\sigma}},\,\frac{\dot{Y}(\sigma)}{\frac{\mathrm{d}\zeta}{\mathrm{d}\sigma}}\right)}\,\frac{\mathrm{d}\zeta}{\mathrm{d}\sigma}\,\mathrm{d}\sigma \\= \int_{\tau=0}^1\,\sqrt{g(\dot{Y}(\tau),\,\dot{Y}(\tau))}\,\mathrm{d}\tau.\tag{7}$$

Thus if $X(\tau)$ is an extremal path, then so is $X(\zeta(\tau))$ for any smooth, monotonic function with $\zeta(0)=0;\,\zeta(1)=1$. Intuitively, if we drive from A to B along the shortest (or longest) road, we can drive it with any speed versus time graph we choose, but we've still driven the extremal road.

The geodesic flow in the tangent space has multiple flowlines, indeed a whole sheet of flowlines, between any two points in configuration space; the preimages of the projection onto geodesics through configuration space at any given point are rays of tangent vectors, with the preimage over each point in the geodesic containing tangent vectors that are all scale multiples of one another.

The Good

Let's look at the solution to this problem in semi-Riemannian geometry. Here, we cheat a bit and extremize the action integral:

$$\mathcal{L} = \int_0^1 g(\dot{X},\,\dot{X})\,\mathrm{d}\tau\tag{9}$$

i.e. we simply forget about the square root! (I suspect this crazy idea was originally tried through sheer desperation). Now we look at the Cauchy-Schwarz inequality for a relationship between the "real" and "cheat" Lagrangians:

$$\int_0^1\,\sqrt{g(\dot{X},\,\dot{X})}\,\cdot 1\,\mathrm{d}\tau \leq \sqrt{\int_0^1\,g(\dot{X},\,\dot{X})\,\mathrm{d}\tau}\sqrt{\int_0^1\,1\,\mathrm{d}\tau} = \sqrt{\int_0^1\,g(\dot{X},\,\dot{X}),\mathrm{d}\tau}\tag{10}$$

with equality if and only if $\sqrt{g(\dot{X},\,\dot{X})}$ is constant. We have already seen that if $X(\tau)$ minimizes the leftmost integral in $(9)$, then so does $X(\zeta(\tau))$ where $\zeta(0)=0;\,\zeta(1)=1$. So we then find the function $\zeta(\tau)$ that makes $\sqrt{g(\dot{Y},\,\dot{Y})}$ constant and equal to the average speed of $X$ for the minimizing $X$. Cauchy Schwarz saturates for this case so that we see that the minimum of $\int_0^1\,\sqrt{g(\dot{X},\,\dot{X})}\cdot 1\,\mathrm{d}\tau$ is precisely the same as the minimum of $\int_0^1\,g(\dot{X},\,\dot{X})\cdot 1\,\mathrm{d}\tau$, given the integrand is positive. On the other hand, if we are seeking to maximize the action $(8)$, which is the case for geodesics in a Lorentzian manifold, then we simply go ahead and maximize $(9)$. By "coincidence", we find that the maximization happens when $g(\dot{X},\,\dot{X})$ is constant so that the overbound represented by $(9)$ is saturated in this case, so that we have found one of the solutions that maximize $\int_0^1\,\sqrt{g(\dot{X},\,\dot{X})}\cdot 1\,\mathrm{d}\tau$ also. Having found the one solution that minimizes the left-hand side of $(1)$, we can characterize all others through a transformation $\tau=\zeta(\sigma)$ with $\zeta(0)=0;\,\zeta(1)=1$. Or, in General Relativity, we ignore all the other solutions because we postulate that the physical one is one where an observer’s proper time $\tau$ advances uniformly, the four-velocity is thus constant, acceleration is Minkowski-orthogonal to velocity and $\tau$ is thus affine. So we actually get more than the shape of the geodesic path with this approach; we also get an affine path parameterization.

So we now easily get our Hamiltonian formulation; if we put $\mathcal{L}=\frac{1}{2}\,g(\dot{X},\,\dot{X})$ then we get $P=\dot{X} _\flat;\,p_k = g_{kj} \dot{x}^j$ is simply the covector of $\dot{X}$ found by lowering the index of the latter, and so we have:

$$\mathcal{L} = \mathcal{H} = \frac{1}{2} g(\dot{X},\,\dot{X}) = \frac{1}{2}\,g_{ij}\dot{x}^i\,\dot{x}^j = \frac{1}{2} g^{\sharp\kern+1.4pt\sharp}(P,\,P)=\frac{1}{2}\,g^{ij}\,p_i\,p_j\tag{11}$$

We now see another reason why this crazy solution is beloved by physicists: the Lagrangian and Hamiltonian in $(11)$ are those for the corresponding formulations of Newtonian mechanics for a free particle. This is, therefore, a most pleasing, natural analogy when we are thinking about a particle that is "coasting" in an inertial frame. The Euler-Lagrange equation for $(11)$ is readily shown to be $\ddot{X}^k + \Gamma^k_{ij}\,\dot{X}^i\,\dot{X}^j=0$; the analogy with a free particle makes it very satisfying to witness that Newton’s second law is $-g^{kj} \partial_{x^j} V = F^k = m\left(\ddot{X}^k + \Gamma^k_{ij}\,\dot{X}^i\,\dot{X}^j\right)$ when one puts a potential $V(x)$ into the mix. It's a thoroughly beautiful physical analogy. Hamilton’s equations for the geodesic are:

$$\dot{x}^k = g^{kj}\,p_j;\quad \dot{p}_k = -\frac{1}{2}\,\left(\partial_{x^k}\,g^{ij}\right)\,p_i\,p_j\tag{12}$$

Much of the time, this solution is perfectly acceptable in optics, too. Naturally, it will handle all the calculations of rays in smoothly inhomogeneous mediums. In optics, the affine parameter corresponding to proper time in GR is the optical path length or the total phase delay along the path.

What seems like a cheat at first leads to a solution that is very elegant, smooth and easy and, for General Relativity and indeed most geometry, thoroughly complete.

However, this elegant solution has an awkward property in optics when we hit abrupt interfaces between dielectric mediums, which is an essential situation to analyze when we’re talking about lenses and mirrors, for example. The Hamiltonian approach necessitates a Lagrangian that is at least a $C^2$ function of $\dot{X}$, which assumption breaks down at such an abrupt interface. OK, so we use the Hamiltonian approach aside from at the interface, and we work out what transformation the interface works on the ray state $(X,\,P)$. But it turns out that if we do this, then Snell’s law shows us that:

The transverse components of the optical momentums are continuous across the interface, whereas the normal component necessarily is not.

That is, the transformation on the optical state $(X,\,P)$ wrought by the ray’s passage across the abrupt interface is not a symplectomorphism. The same is true of mirrors with this approach: $X$ is continuous across the interface, whereas $P\mapsto -P$, so the determinant of this linear transformation is -1 in three dimensions. The easiest way to understand all this is to note that the Hamiltonian in $(12)$ equals the constant speed of the point in $(\text{optical phase per unit time})^2$; we can arbitrarily set this $ 1/2$ unit - we can choose any constant as long as we are consistent (scaled and shifted affine parameters are still affine). If we take on this convention, and if we use coordinates which are locally Cartesian at the interface with the $x-y$ plane parallel to and the $z$ direction normal to the interface, then the optical momentums can be shown to be $p_k = n\,\gamma_k$, where $n$ is the refractive index at the point where the ray meets the interface and $\gamma_k$ are the direction cosines that the ray’s direction makes with the orthonormal axes. From here, one can readily prove the above assertion about Snell’s law.

This situation brings us to the commoner method of handling singular Legendre Transformations - the use of constraints to get rid of "The Bad and The Ugly" redundancy that we discussed above. The use of $\mathcal{L} =\frac{1}{2} g(\dot{X},\,\dot{X})$ can be thought of as belonging to this idea if we think of it as finding the geodesic together with the constraint that our path parameter should be affine so that $\mathcal{L}=const$. In optics, when there are lenses and mirrors involved, the common solution is to constrain the speed along the path to be such that one of the coordinates $x^i$, say $x^3$ is itself the path parameter so that $\dot{x}^3=1$. The most obvious illustration of this idea is where the optical system has an optical axis; we measure the $z$-coordinate along this axis, and so $z$-coordinate is the path parameter. This culls $x^3$ and $p^3$ from the Lagrangian and Hamiltonian, and now the phase space is four rather than six-dimensional. More generally, we use generalized coordinates so that $\partial_1$ and $\partial_2$ are orthogonal to $\partial_3$ and that surfaces of constant $x^3$ are aligned with the dielectric interfaces. Clearly, we can do this: we can use "voltage" coordinates (my name, not used in the literature) where the lens surfaces are equipotential surfaces and surfaces of constant $x^3$ in an electrostatics problem, and then the directions of increasing $x^1$ and $x^2$ lie in the equipotential surfaces. The third coordinate is then the voltage at any point. If we do this, then the transverse components of the optical momentum are still continuous across each interface. Since the $x^1$ and $x^2$ coordinates are also continuous, the dielectric interface now imparts a symplectomorphism - indeed the identity operator - on the optical state in this four-dimensional approach. In Cartesian coordinates, with $z$ as the path parameter, this approach looks like:

$$\begin{array}{ll}\mathcal{L} = n\,\sqrt{1+\dot{x}^2+\dot{y}^2} & \mathcal{H}=-\sqrt{n^2-p_x^2-p_y^2}\\ p_x = n\frac{\dot{x}}{1+\dot{x}^2+\dot{y}^2} & p_y = n\frac{\dot{y}}{1+\dot{x}^2+\dot{y}^2};\;\end{array}\tag{13}$$

and the epigraphs of both $\mathcal{L}$ and $\mathcal{H}$ are both perfectly well-behaved, convex / concave hyperboloids. The path parameter is not affine, though, so you can't easily use this approach to calculate where the phase fronts are.

Sometimes, in optics, we use both approaches: if you want to calculate where the wavefronts of a field diverging from a source are, then you are clearly going to need the affine path parameters to know where the surface of each constant phase crosses the geodesics and the six-dimensional, $\mathscr{L} = g(\dot{X},\,\dot{X})$ approach. To do Ray Transfer Analysis, or if you need to avail yourself of the powerful optical invariant or the étendue notions (which are both invariant differential forms on symplectic optical phase space), then you need all the interfaces in the system to impart symplectomorphisms on the optical state and one will use the four-dimensional approach.

Boy S · Answer 3 · 2017-06-15T14:12:54.913

If you want a classical simple, physically significant and famous example of a toy model in fluid mechanics, you can consider heavy particles in a cell flow:

\begin{eqnarray} \ddot x&=& -\frac{\dot x- U sin(y)}{\tau}\nonumber\\ \ddot y&=& -\frac{\dot y- U sin(x)}{\tau}\nonumber\\ \end{eqnarray}

where $\tau$ is the Stokes time of the drag force of the particles. The sines represent the carrier flow, which is made of cells. In some situations, in order to study the asymptotic behaviour, it turns out to be of interest to neglect the drag proportional to the velocity and you arrive at:

\begin{eqnarray} \ddot x&=& -\frac{ U sin(y)}{\tau}\nonumber\\ \ddot y&=& -\frac{U sin(x)}{\tau}\nonumber\\ \end{eqnarray}

You can write this system in the Phase space:

\begin{eqnarray} \dot x&=& u\nonumber\\ \dot u&=& \frac{U sin(y)}{\tau}\nonumber\\ \dot y&=& v\nonumber\\ \ddot v&=& \frac{ U sin(x)}{\tau}\nonumber\\ \end{eqnarray}

This system comes from the Hamiltonian $$ H(x,y,u,v)=uv+U\frac{cos x+ cos y}{\tau} $$

which is neither convex nor bounded. The Lagrangian immediately follows.

What happens then? For the foregoing systems, you have numerical evidence of chaos, asymptotic stable and unstable points and coherent super diffusion... The dynamical explanation of the latter is still an open problem for cell flows.

Are the Hamiltonian and Lagrangian always convex functions?

3 Answers3

Linked