A lot has been said about the many recent attempts to make AI explainable, and over the last year or so I have made a good faith effort to read all of it. Especially now that I get to think and talk about this stuff near-full-time since I started my PhD program at the U (which is going swimmingly, thanks for asking) I have a much more coherent perspective on the issue and a lot to write about. As I work towards generating some takes of my own, however, I thought I would share a few of the highlights of my literature review.
What are your favorite papers or authors on this topic? @ me next time!
Please stop permuting features
Hooker, G. and Mentch, L.
Such a polite alternative to “feature importance considered harmful.” “Wait,” you say. “Feature importance is not the same as XAI.” First of all, that acronym is awful. Second,

According to this paper, Leo Breiman committed an original sin when he invented random forests and started the trend of running partially scrambled data through a trained model to simulate and thus measure the effect of “removing” information.
Why was this such a bad idea? The short answer is, when you create semisynthetic instances that are actually very improbable in the true data distribution (and yes, many “explanation” methods are guilty of doing this too), your poor model is forced to extrapolate to a part of the feature space it has never seen, and that model’s extrapolation behavior to this new area is not typically relevant to whatever question you’re asking about the model.
This begs the question: What are you asking about the model anyway? And what information do you think the answer is giving you?? Examples of the questionable usage of feature importance for things other than feature selection abound in the following paper.
Trust in data science: collaboration, translation, and accountability in corporate data science projects
Passi, S. and Jackson, S.
If, like me, you have worked as a data scientist professionally, you will feel Extremely Seen by this paper. (I mean, probably. It’s a case study, so generalizability is not the point. I don’t care if you work at Amazing Organization Inc. and can’t relate.)
This paper is not just about explanations, as the title suggests, but it describes the very complex life cycle of a data science project including how the scientists negotiate confidence and trust of, for instance, management, or customers. Hint: it’s not just accuracy metrics! There is lots of explaining to be done.
There are some amazing quotes about the frequent use of intuition along with feature importance in model evaluation. I’m going to make you actually read the paper to get to them, but here is a great sentence very roughly summarizing some of the methodology described at the company investigated:
Often, intuitive results are made visible to inspire trust in models, while sometimes counter-intuitive results are posited as moments of new discoveries.
Okay, so if a feature with an “intuitive” connection to a model’s target appears to be the most predictive, that’s good, but if an “unintuitive” feature also turns out to be important, that’s …. also good? Did someone say confirmation bias? No? Well, this next paper does.
The intuitive appeal of explainable machines
Selbst, A. and Barocas, S.
Why are people so gung-ho about explanations in AI anyway? This paper analyzes the general desire for explanations in the air, arguing that it is because machine learning models can be “both inscrutable and nonintuitive,” therefore making it hard to tell whether a machine-made decision was justified from a legal or ethical standpoint. However, legislating to solve the scrutability problem (by requiring explanations, for instance) will not necessarily ease the task of testing for ethical properties:
In most cases, intuition serves as the unacknowledged bridge between a descriptive account and a normative evaluation.
What’s so wrong with intuition? It comes in handy so often! But it can also mislead, and the section of the paper on this point is my favorite part. Lots of references to “Thinking Fast and Slow,” which now I am thinking I should probably read.
Explanation in artificial intelligence: insights from the social sciences
Miller, T.
So, turns out this entire field is complicated by the fact that explanation in itself is a super confusing concept, but luckily this has also been studied for like, decades. This paper is a massive survey and analysis of existing literature on how humans generate, select, evaluate, and communicate explanatory information. Spoiler alert: causality is a pretty important part of the picture — but, as many have argued over the years, there’s a lot more going on. Here are what I think are the 2 most important findings in the paper, from my own biased perspective:
- “Explanations are contrastive — they are sought in response to particular counterfactual cases, which are termed foils in this paper. That is, people do not ask why event P happened, but rather why event P happened instead of some event Q .”
- “Explanations are selected (in a biased manner) — people rarely, if ever, expect an explanation that consists of an actual and complete cause of an event. Humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation. However, this selection is influenced by certain cognitive biases.”
The punchline once you get into the weeds, of course, is that a lot of so-called “XAI” solutions are far, far removed from what humans actually desire from an explanation.
What this paper doesn’t do is present a legal, methodological, or moral necessity for explanation; instead, it acknowledges the fact that “the running hypothesis is that by building more transparent, interpretable, or explainable systems, users will be better equipped to understand and therefore trust the intelligent agents.”
Towards a rigorous science of interpretable machine learning
Doshi-Velez, F. and Kim, B.
I, for one, am fully on board with equipping users to understand and “normatively evaluate” the algorithmic systems they engage with. But methodologically sound experiments must be done to show this is actually what explanation methods can achieve, because I think we have reason to be skeptical at this point.

So I will leave you with this final paper. Happy researching, folx. Might want to brush up on your HCI.