What exactly is an interpretable machine learning model?

Question

From this page in Interpretable-ml book and this article on Analytics Vidhya, it means to know what has happened inside an ML model to arrive at the result/prediction/conclusion.

In linear regression, new data will be multiplied with weights and bias will be added to make a prediction.

And in boosted tree models, it is possible to plot all the decisions as trees that results in a prediction.

And in feed-forward neural networks, we will have weights and biases just like linear regression and we just multiply weights and add bias at each layer, limiting values to some extent using some kind of activation function at every layer, arriving finally at prediction.

In CNNs, it is possible to see what happens to the input after having passed through a CNN block and what features are extracted after pooling (ref: what does a CNN see?).

Like I stated above, one can easily know what happens inside an ML model to make a prediction or conclusion. And I am unclear as to what makes them un-interpretable!. So, what exactly makes an algorithm or it's results un-interpretable or why are these called black box models? Or am I missing something?

score 2 · Accepted Answer · answered Oct 09 '20 at 10:03

In a simple linear model of the form $y = \beta_0 + \beta_1 x $ we can see that increasing $x$ by a unit will increase the prediction on $y$ by $\beta_1$. Here we can completely determine what the effect on the models prediction will be by increasing $x$. With more complex models such as neural networks it is much more difficult to tell due to all the calculations that a single data point is involved in. For instance, in a CNN as you mentioned, if I changed the value of a pixel in an image we were passing through the CNN you wouldn't really be able to tell me exactly the effect this would have on the prediction like you can with the linear model.

What exactly is an interpretable machine learning model?

1 Answers1