A thought experiment on Explainability

How do we trade off explainability?

Nov 06, 2022

Imagine that machine learning algorithms become consistently better at detecting breast cancers than radiologists. Imagine also that the explanations that algorithms provide for their decisions aren’t really sound from a physiopathological point of view. Suppose finally that we could develop alternative algorithms that provide sound physiopathological explanations for their decisions but miss out on a number of diagnoses.1

How many undetected breast cancers would we accept in exchange for explainability? I’m leaning toward zero, but I’m very open to changing my mind!

Notice that we may perform the same thought experiment in different realms. For instance, imagine that algorithms become consistently better than juries at identifying guilty defendants, and that, as such, their use minimises the number of wrongful convictions. However, the explanations that algorithms provide for their decisions aren’t fully sound to the human (and legal) mind.

How many wrongful convictions would we be willing to accept in exchange for explainability? Again, I lean toward zero, but there is a further complication here, which has to do with odd AI mistakes. Let me try and articulate.

There is a certain predictability in the way humans make mistakes. Such predictability makes such mistakes acceptable to the general public. Some machines, despite being trained by humans, may make decisions on the basis of extremely quirky correlations. Mistakes based on quirky correlations, I take it, are way less acceptable to the public. I may be wrong, but I feel that odd AI mistakes are the main driver of algorithm aversion in our society. In this respect, it would be interesting to test people’s behavior in response to different kinds of AI mistakes (quirky and socially acceptable).

The normative question, though, is: should we accept odd mistakes in order to pursue certain normatively relevant goals (e.g. minimise the number of wrongful convictions)? While the answer may seem a straightforward ‘YES’ (or so I thought a few weeks ago), it may raise problems with respect to moral desert and the separateness of persons.

The underlying idea here is that certain values, along with the predictability of human mistakes, may shape some people’s strategies and behavior in certain kinds of directions which may make them more deserving than others. However, AI may completely ignore this fact and thus make these morally deserving people worse off by exposing them more to certain kinds of risks (e.g. wrongful convictions, and car accidents in a world of self-driving cars). In other words, the use of AI may redistribute certain risks in our societies in ways that may be morally problematic.

Post-hoc addendum: just found this interesting article on how people trade off accuracy vs explainability. I think redistribution of risk plays a role here.

As many of you know, the accuracy-explainability tradeoff is relatively undisputed in the AI literature. See, for instance, this.

Notes on Ethics and Enterprise

Discussion about this post