2

Let us assume $ p_0 $ is a probability distribution over $ \mathbb{R}^d $. Let $ x_t $ be a diffusion process defined as: \begin{equation} x_t = x + \sigma_t z, \end{equation} where $ x \sim p_0 $, $ z \sim \mathit{N}(0,I) $, and $ \sigma_t < \sigma_{t+\varepsilon} $ for any $ \varepsilon > 0 $, with $ \sigma_t \in \mathbb{R}_+ $. Thus, we have $ p(x_t) = p_0 * \mathit{N}(0, \sigma_t^2 I) $, where $ * $ denotes the convolution operator.

Now, assume that we have access to an ODE solver of the diffusion process: \begin{equation} \frac{dx}{dt} = v_t(x), \end{equation}

with an associated flow function that satisfies: \begin{equation} f_t(x_t) = x_t - \int_0^t v_s(x_s) \, ds. \end{equation}

Next, define the following random variable:

$$\tilde{x_t} = f_t(x + \sigma_t z) + \sigma_t z.$$

What can be said about: (i) the distribution of $ \tilde{x_t} $? (ii) the coupling between $ f_t(x + \sigma_t z) $ and $ z $?

My intuition is that the flow from the ODE solver, $ f_t(x + \sigma_t z) $, enables better alignment (and reduced transport cost) between the pair $ (f_t(x + \sigma_t z), z) $ compared to $ (x, z) $, but I was not able to formalize it.

tzb
  • 21
  • 2

1 Answers1

2

Assume that $\sigma_t$ is sufficiently larger than the "typical size" of the samples of $p_0$. Then we can make several heuristic observations:

  • $\mathbf{x} + \sigma_t \mathbf{z} \approx \sigma_t \mathbf{z}$ holds, since $\mathbf{x}$ is negligible compared to $\sigma_t \mathbf{z}$ with high probability.

  • On the other hand, the flow $f_t$ has the property that if $\mathbf{x}_t \sim p_t$, then $f_t(\mathbf{x}_t) \sim p_0$. Since $\mathbf{x} + \sigma_t \mathbf{z} \sim p_t$, we have $f_t(\mathbf{x} + \sigma_t \mathbf{z}) \sim p_0$.

From these two observations, we can expect that

  1. $f_t(\mathbf{x} + \sigma_t \mathbf{z})$ is a sample of $p_0$ which is almost independent from $\mathbf{x}$ and is an almost deterministic function of $\mathbf{z}$. Consequently, $$ p(f(\mathbf{x} + \sigma_t \mathbf{z}), \mathbf{z}) \approx p(f(\sigma_t \mathbf{z}), \mathbf{z}) $$

  2. By the assumption, $f_t(\mathbf{x} + \sigma_t \mathbf{z})$ is also negligible compare to $\sigma_t \mathbf{z}$. Consequently, $\tilde{\mathbf{x}}_t \approx \sigma_t \mathbf{z}$.


As for the transportation cost, write $\mathbf{x}_t = \mathbf{x} + \sigma_t \mathbf{z}$ for simplicity and introduce the function

$$ I(t) = \mathbb{E}\bigl[ \| f_t(\mathbf{x}_t) - \mathbf{z}\|^2 \bigr], $$

which measures the transportation cost using the square distance cost function. Note that

$$ I(0) = \mathbb{E}\bigl[ \| \mathbf{x} - \mathbf{z}\|^2 \bigr], $$

hence OP's question can be rephrased to whether $ I(0) > I(t)$ or not. For this, the following observation comes handy:

Lemma. In addition to OP's setting, assume that $v_t(\mathbf{x}) = \mathbb{E}[\dot{\mathbf{x}}_t \mid \mathbf{x}_t = \mathbf{x}]$, which is the usual choice in the flow matching literature. Then $v_t$ generates the probability path $(p_t)$, and \begin{align*} v_t(\mathbf{x}) &= \dot{\sigma}_t \mathbb{E}[z \mid \mathbf{x}_t = \mathbf{x}] = -\dot{\sigma}_t \sigma_t \nabla_\mathbf{x} \log p_t (\mathbf{x}). \end{align*}

Using this, we can show that:

Claim. $I'(t) = -2\dot{\sigma}_t d + o(t)$ as $t \to 0^+$. In particular, if $\sigma_t$ is continuously differentiable with $\dot{\sigma}_0 > 0$, then $I(t)$ is decreasing for small $t$.

I am not sure if this trend will continue as $t$ grows, but my hunch is that the cost will indeed remain smaller. This is indeed true in the following special case:

Special Case. Assume $d = 1$ and $x \sim \mathcal{N}(0, 1)$. Then we can check that

$$ v_t(\mathbf{x}) = \frac{\dot{\sigma}_t \sigma_t}{1+\sigma_t^2} \mathbf{x} \qquad\text{and}\qquad f_t(\mathbf{x}) = \frac{1}{\sqrt{1+\sigma_t^2}} \mathbf{x}. $$

Using this, we can find an analytic formula for $I(t)$ :

$$ I(t) = 2 \left(1 - \frac{\sigma_t}{\sqrt{1+\smash[b]{\sigma_t^2}}} \right) $$

This is clearly decreasing in $t$. Also, it is easy to check that $I'(0) = -2\dot{\sigma}_0$, confirming the claim with $d = 1$.


Proof of Lemma. For any test function $\varphi \in C^{\infty}_c(\mathbb{R}^d)$, we evaluate the time-derivative of $\mathbb{E}[\varphi(\mathbf{x}_t)]$ in two ways. One one hand,

\begin{align*} \frac{\partial}{\partial t} \mathbb{E}[\varphi(\mathbf{x}_t)] &= \frac{\partial}{\partial t} \int_{\mathbb{R}^d} \varphi(\mathbf{x}) p_t(\mathbf{x}) \, \mathrm{d}\mathbf{x} = \int_{\mathbb{R}^d} \varphi(\mathbf{x}) \frac{\partial}{\partial t} p_t(\mathbf{x}) \, \mathrm{d}\mathbf{x}. \end{align*}

On the other hand, differentiating the random variable $\varphi(\mathbf{x}_t)$ directly and using the definition of $v_t$,

\begin{align*} \frac{\partial}{\partial t} \mathbb{E}[\varphi(\mathbf{x}_t)] &= \mathbb{E}[\nabla \varphi(\mathbf{x}_t) \cdot \dot{\mathbf{x}}_t] \\ &= \mathbb{E}[\nabla \varphi(\mathbf{x}_t) \cdot \mathbb{E}[\dot{\mathbf{x}}_t \mid \mathbf{x}_t]] \\ &= \mathbb{E}[\nabla \varphi(\mathbf{x}_t) \cdot v_t(\mathbf{x}_t)] \\ &= \int_{\mathbb{R}^d} \nabla \varphi(\mathbf{x}) \cdot v_t (\mathbf{x}) p_t(\mathbf{x}) \, \mathrm{d}\mathbf{x} \\ &= - \int_{\mathbb{R}^d} \varphi(\mathbf{x}) \nabla \cdot (v_t(\mathbf{x}) p_t(\mathbf{x})) \, \mathrm{d}\mathbf{x} \end{align*}

where we used the law of iterated expectation in the second line and integration by parts in the last line. Now, since $\varphi$ is arbitrary, this shows that $v_t$ satisfies the continuity equation:

$$ \frac{\partial}{\partial t} p_t + \nabla \cdot (v_t p_t) = 0 $$

It is well-known that this implies that $v_t$ generates the probability path $p_t$. Next, invoking OP's settiing,

\begin{align*} v_t(\mathbf{x}) &= \mathbb{E}[ \dot{\mathbf{x}}_t \mid \mathbf{x}_t = \mathbf{x}] = \dot{\sigma}_t \mathbb{E}[\mathbf{z} \mid \mathbf{x}_t = \mathbf{x}] \\ &= \frac{\dot{\sigma}_t}{\sigma_t} \mathbb{E}[ \mathbf{x}_t - \mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = \frac{\dot{\sigma}_t}{\sigma_t} (\mathbf{x} - \mathbb{E}[ \mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] ). \end{align*}

Since $\mathbf{x}_0$ is independent of the gaussian variable $\mathbf{z}$ and $\mathbf{x}_t = \mathbf{x}_0 + \sigma_t \mathbf{z}$, we can invoke Tweedie's formula to find:

$$ \mathbb{E}[ \mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = \mathbf{x} + \sigma_t^2 \nabla \log p_t(\mathbf{x}). $$

Plugging this back, we obtain the desired identity.


Proof of Claim. Note that $f_t^{-1}(\mathbf{y})$ sends the initial point $\mathbf{y}$ at time $0$ along the vector field $v_t$ up to time $t$. Consequently, $f_t^{-1}$ is a flow with the corresponding vector field $v_t$, i.e.,

$$\frac{\partial}{\partial t} f_t^{-1}(\mathbf{y}) = v_t(f_t^{-1}(\mathbf{y})).$$

Using this observation, by differentiating both sides of the identity $\mathbf{x} = f_t(f_t^{-1}(\mathbf{x}))$ with respect to $t$, we find that the material derivative of $f_t$ is zero:

$$ \frac{\partial f_t}{\partial t} + \frac{\partial f_t}{\partial \mathbf{x}} v_t = 0 $$

Here, $\frac{\partial g}{\partial \mathbf{x}}$ stands for the Jacobian matrix of the multivariable function $g : \mathbb{R}^m \to \mathbb{R}^n$, and all the vectors are regarded as column vectors. Combining these altogether,

\begin{align*} I'(t) &= 2 \mathbb{E}\left[ \left\langle \frac{\partial}{\partial t}(f_t(\mathbf{x}_t)), f_t(\mathbf{x}_t) - \mathbf{z} \right\rangle \right] \\ &= 2 \mathbb{E}\left[ \left\langle \frac{\partial f_t}{\partial t}(\mathbf{x}_t) + \frac{\partial f_t(\mathbf{x}_t)}{\partial \mathbf{x}_t} \dot{\mathbf{x}}_t, f_t(\mathbf{x}_t) - z \right\rangle \right] \\ &= 2 \mathbb{E}\left[ \left\langle \frac{\partial f_t(\mathbf{x}_t)}{\partial \mathbf{x}_t} \left( \dot{\mathbf{x}}_t - v_t(\mathbf{x}_t) \right), f_t(\mathbf{x}_t) - z \right\rangle \right] \\ &= 2 \dot{\sigma}_t \mathbb{E}\left[ \left\langle \frac{\partial f_t(\mathbf{x}_t)}{\partial \mathbf{x}_t} \left( \mathbf{z} - \mathbb{E}[\mathbf{z} \mid \mathbf{x}_t] \right), f_t(\mathbf{x}_t) - z \right\rangle \right] \end{align*}

Now let $\mathbf{w}_t = \mathbf{z} - \mathbb{E}[\mathbf{z} \mid \mathbf{x}_t] $. Then $\mathbb{E}[\mathbf{w}_t \mid \mathbf{x}_t] = 0$, hence $\mathbb{E}[ \langle \mathbf{w}_t, g(\mathbf{x}_t) \rangle ] = 0$ for essentially any function $g : \mathbb{R}^d \to \mathbb{R}^d$. Using this, we can further simplify the last line as:

\begin{align*} I'(t) &= - 2 \dot{\sigma}_t \mathbb{E}\left[ \left\langle \frac{\partial f_t(\mathbf{x}_t)}{\partial \mathbf{x}_t} \mathbf{w}_t, \mathbf{z} \right\rangle \right] \\ &= - 2 \dot{\sigma}_t \mathbb{E}\left[ \left\langle \frac{\partial f_t(\mathbf{x}_t)}{\partial \mathbf{x}_t} \mathbf{w}_t, \mathbf{w}_t \right\rangle \right] \end{align*}

Now, when $t = 0$,

  • $f_0$ is the identity function, hence its Jacobian is the identity matrix: $\frac{\partial f_0}{\partial \mathbf{x}} = \mathbf{I}$.

  • $\mathbf{w}_0 = \mathbf{z} - \mathbb{E}[\mathbf{z} \mid x_0 = x] = \mathbf{z}$, where the last equality follows from the independence between $\mathbf{z}$ and $\mathbf{x}$.

Consequently,

$$ I'(0) = -2\dot{\sigma}_0 \mathbb{E}[\|\mathbf{z}\|^2] = -2\dot{\sigma}_0 d, $$

and the conclusion follows from the continuity of $I'(t)$.

Sangchul Lee
  • 188
  • 4