2

Recently, I came across the paper Robust and Stable Black Box Explanations, which discusses a nice framework for global model-agnostic explanations.

I was thinking to recreate the experiments performed in the paper, but, unfortunately, the authors haven't provided the code. The summary of the experiments are:

  1. use LIME, SHAP and MUSE as baseline models, and compute fidelity score on test data. (All the 3 datasets are used for classification problems)

  2. since LIME and SHAP give local explanations, for a particular data point, the idea is to use K points from the training dataset, and create K explanations using LIME. LIME is supposed to return a local linear explanation. Now, for a new test data point, using the nearest point from K points used earlier and use the corresponding explanation to classify this new point.

  3. measure the performance, using fidelity score (% of points for which $E(x) = B(x)$, where $E(x)$ is the explanation of the point and $B(x)$ is the classification of the point using the black box.

Now, the issue is, I am using LIME and SHAP packages in Python to achieve the results on baseline models.

However, I am not sure how I'll get a linear explanation for a point (one from the set K), and use it to classify a new test point in the neighborhood.

Every tutorial on YouTube and Medium discusses visualizing the explanation for a given point, but none talks about how to get the linear model itself and use it for newer points.

nbro
  • 42,615
  • 12
  • 119
  • 217
user294142
  • 21
  • 1

1 Answers1

1

For LIME, the local model that is trained can be found at lime.lime_base.explain_instance_with_data under the name "easy_model".

Saurav Maheshkar
  • 750
  • 1
  • 8
  • 20
Hajar
  • 11
  • 1