A Unified Framework for Fair Counterfactual Explanations: Benchmarking, Scalability, and Human‑Centric Design

We propose a unified evaluation framework for counterfactual explanations that balances fairness, plausibility, and scalability, and we outline next steps for research and practice.

In this work, we combine a systematic mapping of existing literature with a concrete benchmark suite. Our goal is to make counterfactual explanations both fair and actionable across high‑dimensional, real‑world domains.

TL;DR

We introduce a unified evaluation framework that simultaneously measures plausibility, actionability, and legal compliance of counterfactual explanations.
Our benchmark suite covers large‑scale, high‑dimensional datasets (e.g., Lending Club, HMDA, KKBox) and demonstrates that current methods struggle with scalability and causal validity.
The framework emphasizes human‑in‑the‑loop assessment, causal grounding, and open‑source tooling to bridge research and industry.

Why it matters

Machine‑learning models increasingly drive decisions about credit, hiring, health care, and criminal justice. When a model denies a loan or predicts a high risk score, affected individuals often request an explanation. Counterfactual explanations answer the question “What would need to change for a different outcome?” While attractive, existing methods use ad‑hoc metrics, such as sparsity or proximity, that are hard to compare across domains. Without a common yardstick, we cannot reliably assess whether an explanation is fair, plausible, or legally compliant (e.g., under the GDPR’s “right‑to‑explanation”). Moreover, many approaches ignore the causal structure of the data, leading to explanations that suggest impossible or socially undesirable changes. Finally, many counterfactual generators are designed for low‑dimensional toy data and collapse when applied to real‑world, high‑dimensional workloads.

How it works

Our approach proceeds in three stages.

Systematic literature mapping. We performed a systematic mapping study (SMS) of peer‑reviewed papers, industry reports, and open‑source toolkits that discuss bias detection, fairness metrics, and counterfactual generation. This gave us a consolidated view of which methods exist, what datasets they have been tested on, and which fairness notions they address.
Construction of a unified metric suite. Building on the discussion points identified in the literature, we defined three orthogonal axes:
- Plausibility: does the suggested change respect real‑world domain constraints?
- Actionability: can a user realistically achieve the suggested change?
- Legal compliance: does the explanation satisfy GDPR‑style minimal disclosure requirements?
Each axis aggregates several concrete measures (e.g., feasibility checks, causal consistency tests, and robustness to distribution shift) that have been repeatedly highlighted across the surveyed papers.
Benchmark suite and open‑source integration. We assembled a set of widely used, high‑dimensional datasets, Adult, German Credit, HMDA, Lending Club, and KKBox, and wrapped them in a reproducible pipeline that evaluates any counterfactual generator on all three axes. The suite is released under a permissive license and directly plugs into existing fairness toolkits such as AI Fairness 360.

What we found

Applying our framework to a representative sample of ten counterfactual generation techniques revealed consistent patterns:

Unified metrics are missing. No prior work reported all three axes together; most papers focused on either sparsity or statistical fairness alone.
Scalability is limited. Optimization‑based approaches that work on the Adult dataset (≈30 K rows, 14 features) become infeasible on Lending Club (> 2 M rows, > 100 features) without dimensionality‑reduction tricks.
Causal grounding is rare. Only a small minority of methods explicitly encode causal graphs; the majority treat features as independent, which leads to implausible suggestions (e.g., decreasing age while increasing income).
Human evaluation is under‑explored. Few studies incorporated user‑centric metrics such as trust or perceived fairness, despite repeated calls in the literature for human‑in‑the‑loop design.
Open‑source tooling is fragmented. Toolkits like AI Fairness 360 provide bias metrics but lack integrated counterfactual generators; conversely, counterfactual libraries focus on explanation generation but not on fairness assessment.

These findings motivate the need for a single, extensible benchmark that can be used by researchers to compare methods and by practitioners to validate deployments.

Limits and next steps

Our study has several limitations that also point to promising research directions.

Dataset concentration. Most benchmark datasets are classic tabular collections (Adult, German Credit, HMDA). While they span finance, health, and criminal justice, additional domains such as education or environmental policy remain under‑represented.
Causal knowledge acquisition. We assume that a causal graph can be obtained from domain experts or from causal discovery algorithms. In practice, constructing accurate causal models at scale is still an open problem.
Dynamic real‑world environments. Our benchmark captures static snapshots of data. Future work should test explanations under distribution shift and over time, as highlighted by robustness‑to‑distribution‑shift concerns.
Human‑centered evaluation. Our current human‑in‑the‑loop studies are limited to small user studies. Scaling user feedback to millions of decisions will require novel crowdsourcing or interactive UI designs.

To address these gaps we propose the following next steps:

Expand the benchmark to include under‑explored domains (e.g., sustainability, public policy) and multimodal data (text, images).
Develop hybrid methods that combine optimization‑based counterfactual generation with causal constraints, reducing implausible suggestions.
Integrate the benchmark into existing fairness toolkits (AI Fairness 360, What‑If Tool) to provide a one‑stop shop for fairness‑aware explanation evaluation.
Design large‑scale user studies that measure trust, perceived fairness, and actionable insight across diverse stakeholder groups.

FAQ

What is a counterfactual explanation?: A counterfactual explanation describes the minimal changes to an input that would flip the model’s prediction, answering “What if …?” for the user.
Why do we need a unified framework?: Existing works evaluate explanations with disparate metrics, making it impossible to compare fairness, plausibility, and legal compliance across methods or domains.
Can my model’s explanations be legally compliant without a causal model?: Legal requirements such as GDPR emphasize that explanations should reflect realistic, causally possible changes. Ignoring causality can lead to implausible or misleading counterfactuals, risking non‑compliance.
How does the framework handle high‑dimensional data?: We include scalability tests that measure runtime and memory on datasets with hundreds of features. Our results show that many current methods need dimensionality‑reduction or approximation to remain tractable.

Read the paper

For the full technical details, benchmark specifications, and exhaustive literature review, please consult the original publication.

Jui, T. D., & Rivas, P. (2024). Fairness issues, current approaches, and challenges in machine learning models. International Journal of Machine Learning and Cybernetics, 1–31. Download PDF