Can I apply reparametrization trick on "any" deep neural network?

Question

I came across the "reparametrization trick" for the first time in the following paragraph from the chapter named Vector Calculus from the test book titled Mathematics for Machine Learning by Marc Peter Deisenroth et al.

The Jacobian determinant and variable transformations will become relevant ... when we transform random variables and probability distributions. These transformations are extremely relevant in machine learning in the context of training deep neural networks using the reparametrization trick, also called infinite perturbation analysis.

The trick has been used in the context of neural networks training in the quoted paragraph. But when I search about the reparametrization trick, I found it only or widely in training autoencoders.

In the context of training a traditional deep neural network, is the trick useful?

score 2 · Answer 1 · edited Nov 19 '21 at 11:36

The reparameterization trick (also known as the pathwise derivative or infinitesimal perturbation analysis) is a method for calculating the gradient of a function of a random variable. It is used, for example, in variational autoencoders or deterministic policy gradient algorithms.

If you plan on working with models that involve random variables, you definitely need to understand what the reparameterization trick is.

You will also need to understand the other method to calculate gradients for functions of random variables, which is known as the likelihood ratio (also known as the score function or the REINFORCE gradient).

If your definition of a "traditional" neural network does not involve random variables, then such a method is irrelevant.

score 0 · Answer 2 · answered Dec 19 '21 at 17:09

Yes, the reparametrization trick can be useful in the context of variational Bayesian neural networks, although other more effective variance reduction techniques are more commonly used (in particular, the flipout estimator). See this implementation of BNNs that uses Flipout, but TensorFlow Probability, the library used to implement that example, also provides layers that implement the reparametrization trick.

Note that the reparametrization trick is used in the context of variational auto-encoders (VAEs) (so not in the context of deterministic auto-encoders). VAEs and BNNs have a lot in common: both are based on stochastic variational inference (i.e. variational inference combined with stochastic gradient descent). So, whenever you have some sampling or some stochastic operation, the reparametrization trick could turn out to be useful. However, right now, I am only familiar with these two types of models that use it.

Can I apply reparametrization trick on "any" deep neural network?

2 Answers2