The adoption of artificial intelligence (AI) across industries, including insurance, has led to the widespread use of complex black-box models such as gradient-boosting machines and neural networks. Although these models offer enhanced efficiency and accuracy, their lack of transparency has raised concerns among regulators and consumers. To address this, interpretation methods from the growing field of interpretable machine learning have gained attention for understanding relationships between model inputs and outputs. However, while stakeholders may possess a certain level of understanding regarding the limitations of these explanations, there is often a lack of awareness regarding the inherent vulnerability of these methods.
This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate model predictions for instances in the extrapolation domain, which produces deceived PD plots that can hide discriminatory behaviors while maintaining the prediction accuracy of the original model. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS dataset, our results show that it is possible to intentionally hide the discriminatory behaviour of a predictor and make the black-box model appear neutral through interpretation tools like PD plots while retaining almost all the predictions of the original black-box model. We will provide managerial insights for regulators and industry practitioners based on the findings.
Fei Huang is a Senior Lecturer in Risk and Actuaries Studies and Lead - Data and AI Tech at UNSW Business AI Lab. She received her BSc. in Mathematics from Xiamen University, MPhil in Actuarial Science from the University of Hong Kong, and PhD in Actuarial Studies from the Australian National University. Before joining UNSW, she was a faculty member at the Australian National University.
Fei's research focuses on Ethical AI (and non-AI) and Data Science for Insurance. In particular, she studies insurance discrimination and pricing fairness in the context of AI (and non-AI). She also develops statistical and machine learning methods for various actuarial applications, including customer churn modeling, mortality modeling, macroeconomic forecasting using Big Data, and general insurance pricing. She has published in top-tier actuarial journals and received the Carol Dolan Actuaries Summit Prize and ASTIN Colloquium Best Paper Award for her research on anti-discrimination insurance pricing. She was a recipient of the Actuaries Institute's Volunteer of the Year Award in the Spirit of Volunteering category and received the UNSW Business School SDG Research Impact Award in 2023.