4

I am working on a research project in a domain where other related works have always resorted to deep Q-learning. The motivation of my research stems from the fact that the domain has an inherent structure to it, and should not require resorting to deep Q-learning. Based on my hypothesis, I managed to create a tabular Q-learning based algorithm which uses limited domain knowledge to perform on-par/outperform the deep Q-learning based approaches.

Given that model interpretability is a subjective and sometimes vague topic, I was wondering if my algorithm should be considered interpretable. The way I understand it, the lack of interpretability in deep-learning-based models stems from the stochastic gradient descent step. However, in case of tabular Q-learning, every chosen action can always be traced back to a finite set of action-value pairs, which in turn are a deterministic function of inputs of the algorithm, although over multiple training episodes.

I believe in using deep-learning-based approaches conservatively only when absolutely required. However, I am not sure how to justify this in my paper without wading into the debated topic of model interpretability. I would greatly appreciate any suggestions/opinions regarding this.

nbro
  • 42,615
  • 12
  • 119
  • 217
harshal.c
  • 141
  • 2

1 Answers1

2

There is not a widely accepted definition of explainable AI (XAI). However, as a rule of thumb (my rule of thumb), if you can't explain it easily to a layperson (or even an expert), then the model or algorithm is not (very) interpretable. There are other concepts related to XAI, such as accountability (who is responsible for what?), transparency and fairness.

For example, the final decision of (trained) decision tree can easily be explained to (almost) any person, so a (trained) decision tree is a relatively interpretable model. See the chapter 4.4. Decision Tree of the book Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.

An artificial neural network (ANN) is usually considered not very interpretable because, unless you attempt to understand which parts of the network contribute to the output of the ANN (for example, with the technique layer-wise relevance propagation), then you cannot immediately or easily understand the output or decision of the ANN, given that an ANN involves many non-linear functions, which produce unintuitive outcomes. In other words, it is more difficult to attribute the contributions of each unit of an ANN to the output of the same ANN than to explain e.g. the decision of a decision tree.

In the context of deep reinforcement learning (DRL), the ANN is used to approximate the value or policy functions. This approximation is, in the first place, the main reason behind the low interpretability of deep RL models.

Q-learning is an algorithm, so it is not a model, like an ANN. Q-learning is used to learn a state-action value function, denoted with $Q: S \times A \rightarrow \mathbb{R}$, which can then be used to derive another function, the policy, which can then be used to take actions. In a way, Q-learning is similar to gradient descent, because both are machine learning (or optimization) algorithms. The $Q$ function is a model of the environment, given that, for each state, it represents the expected amount of reward that can be obtained, so, in a certain way, the learned $Q$ function represents a prediction of reward.

Is the learned tabular $Q$ function interpretable? Yes, it is relatively interpretable, but how much? What kind of interpretation do you really need? It depends on the context and people that need the interpretation or explanation. A reinforcement learning researcher will usually be satisfied with the usual explanation of the inner workings of $Q$-learning, Markov decision processes, etc., because the usual RL researcher is not concerned with the really important problems that involve the life of people and other beings. However, for example, in the context of healthcare, doctors might not just be interested in the explanation "expected maximum future reward", but they might also be interested in the environment, the credit assignment problem, the meaning and effectiveness of the reward function with respect to the actual problem that needs to be solved, in a probabilistic interpretation of the results (rather than just a mere action that needs to be taken), possible alternative good actions, etc.

Recently, there have been some attempts to make RL and, in particular, deep RL more interpretable and explainable. In the paper Programmatically Interpretable Reinforcement Learning (2019), Verma et al. propose a more interpretable (than deep RL) RL framework that is based on the idea of learning policies that are represented in a human-readable language. In the paper InfoRL: Interpretable Reinforcement Learning using Information Maximization (2019), the authors focus on learning multiple ways of solving the same task and they claim that their approach provides more interpretability. In the paper Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees (2018), the authors also claim that their approach facilitates understanding the network's learned knowledge by analyzing feature influence, extracting rules, and highlighting the super-pixels in image inputs.

To conclude, deep RL should not necessarily be avoided: it depends on the context (e.g., it is usually perfectly fine to use deep RL to solve video games). However, in cases where liability is an issue, then deep RL should also be explainable or more explainable alternatives should also be taken into account.

nbro
  • 42,615
  • 12
  • 119
  • 217