Quantifying the Trade-Offs Between Model Interpretability and Predictive Accuracy in Explainable Artificial Intelligence for High-Stakes Decision Systems
Keywords:
Explainable Artificial Intelligence (XAI), model interpretability, predictive accuracy, black-box models, high-stakes decisions, transparency, fairness, algorithmic accountabilityAbstract
In high-stakes domains such as healthcare, finance, and criminal justice, the adoption of artificial intelligence (AI) demands a careful balance between predictive accuracy and interpretability. This paper examines the inherent trade-offs between these two often-competing objectives in Explainable AI (XAI), highlighting the consequences of privileging one over the other in high-impact decision contexts. By analyzing existing interpretability methods, benchmarking model performance, and comparing real-world case studies, this work emphasizes the complexity of model selection for decision-critical applications. Our findings reveal that while interpretable models tend to underperform in predictive power relative to complex black-box models, recent developments in post-hoc and hybrid interpretability frameworks can help bridge the gap. We conclude by outlining a decision-theoretic framework for balancing interpretability and accuracy, emphasizing context-specific optimization strategies.
References
Binns, Reuben. “Fairness in Machine Learning: Lessons from Political Philosophy.” Proceedings of the 2018 Conference on Fairness, Accountability and Transparency, 2018, pp. 149–59.
Breiman, Leo. “Statistical Modeling: The Two Cultures.” Statistical Science, vol. 16, no. 3, 2001, pp. 199–231.
Doshi-Velez, Finale, and Been Kim. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv preprint arXiv:1702.08608, 2017.
Lipton, Zachary C. “The Mythos of Model Interpretability.” arXiv preprint arXiv:1606.03490, 2016.
Lundberg, Scott M., and Su-In Lee. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems, vol. 30, 2017, pp. 4765–74.
Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?: Explaining the Predictions of Any Classifier.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135–44.
Rajkomar, Alvin, et al. “Scalable and Accurate Deep Learning with Electronic Health Records.” npj Digital Medicine, vol. 1, no. 18, 2018.
Caruana, Rich, et al. “Intelligible Models for Healthcare: Predicting Pneumonia Risk and Hospital 30-Day Readmission.” Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1721–30.
Molnar, Christoph. Interpretable Machine Learning. 2019. Accessed from https://christophm.github.io/interpretable-ml-book/
Rudin, Cynthia. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence, vol. 1, 2019, pp. 206–15.
Kleinberg, Jon, et al. “Human Decisions and Machine Predictions.” Quarterly Journal of Economics, vol. 133, no. 1, 2018, pp. 237–93.
Holzinger, Andreas, et al. “What Do We Need to Build Explainable AI Systems for the Medical Domain?” Review of the State of the Art and Guiding Principles, 2019.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. “Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR.” Harvard Journal of Law & Technology, vol. 31, no. 2, 2018, pp. 841–87.
Gilpin, Leilani H., et al. “Explaining Explanations: An Overview of Interpretability of Machine Learning.” Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), 2018, pp. 80–89.
Tonekaboni, Shalmali, et al. “What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use.” Proceedings of the 2019 ACM Conference on Human Factors in Computing Systems (CHI), 2019.