Scalable, interpretable, and auditable AI models for sensitive data analysis and classification tasks

Proposition type:
PhD opportunities
Latest date:
february 15, 2023
Responsability:
UMA
Abstract:
Collaborative project between ENSTA Paris (Applied mathematics unit, optimisation section) and the Cours des Comptes (Direction des m ́ethodes et des données)
Detail:
In recent years, with the widespread use of machine learning and AI, new ways to analyse and process data have appeared. For example, deep learning and neural networks have been successfully adopted in many applications featuring large-scale data sets, ranging from image processing to language understanding. One of the main challenges that we face at the moment is that many of the proposed AI models, and especially deep learning methods, are often not interpretable, which complicates their use in tasks dealing with sensitive data, such as public and private health records, and human resources databases. In the mentioned data examples, it is paramount to develop algorithms that can explain the conclusions they draw, that can be interpreted by domain experts, and that can be audited by external parties.

Think for example at an algorithm that can classify and predict the likelihood of surgical complications for a determined cohort of patients, and therefore that can suggest to perform or not a particular surgery. This is a very sensitive task and the outcome needs to be interpretable by doctors and explainable to patients, as well as auditable if something goes wrong.

Motivated by these considerations, more classical rule-based AI models, e.g., decision trees, have regained attention and have been refreshed in light of the presence of large-scale databases. Rule-based models have inherently transparent inner structures and good model expressivity. In particular, models based on Mixed Integer Linear Programs (MILPs) have recently been introduced in order to identify such models that are optimal in terms of performance and interpretability. However, rule-based models are hard to optimize (i.e., to train), especially on large data sets, due to their discrete parameters and structures. Moreover, while rule-based models are easy to be “corrected” by domain experts by modifying or adding new rules, they often become too involved and complex and they can lose quickly their interpretability. Sometimes we say that these rule-based models are fragile.

Combining the success of deep learning models, and in particular their easy differentiable training process, the architecture flexibility, and their scalability, with the interpretability of rule-based methods is a very vibrant research area in the context of explainable AI (XAI), and we have seen some early results. Algorithm auditing is also starting to become very relevant, especially for government institutions.

In the context of this PhD project, we want to investigate and develop new algorithms at the intersection of deep learning and rule-based models for data analysis and classification. The resulting algorithms will need to be scalable, easy to optimize/train, interpretable, and auditable.