The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

¹TrustIn.AI Lab, Ruhr West University of Applied Sciences, Germany; ²TML Lab, The University of Sydney, Australia; ³e:fs TechHub GmbH, Germany Accepted at The 18th European Conference on Computer Vision ECCV 2024

Abstract

Visual counterfactual explanation (CF) methods modify image concepts, e.g., shape, to change a prediction to a predefined outcome while closely resembling the original query image. Unlike self-explainable models (SEMs) and heatmap techniques, they grant users the ability to examine hypothetical "what-if" scenarios. Previous CF methods either entail post-hoc training, limiting the balance between transparency and CF quality, or demand optimization during inference. To bridge the gap between transparent SEMs and CF methods, we introduce the GdVAE, a self-explainable model based on a conditional variational autoencoder (CVAE), featuring a Gaussian discriminant analysis (GDA) classifier and integrated CF explanations. Full transparency is achieved through a generative classifier that leverages class-specific prototypes for the downstream task and a closed-form solution for CFs in the latent space. The consistency of CFs is improved by regularizing the latent space with the explainer function. Extensive comparisons with existing approaches affirm the effectiveness of our method in producing high-quality CF explanations while preserving transparency. Code and models are public.

BibTeX

@InProceedings{Haselhoff_2024_ECCV, author = {Haselhoff, Anselm and Trelenberg, Kevin and Küppers, Fabian and Schneider, Jonas}, title = {The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2024}}

The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

Examples of counterfactual generation in the latent space.

Abstract

FFHQ Counterfactual Examples

GdVAE Architecture

Regularized Latent Space, Samples, and Prototypes

CelebA Counterfactual Examples

Multi-class MNIST Counterfactual Examples

Poster

BibTeX