Our choice as to which movies to watch or novels to read can be influenced by suggestions made by machine learning (ML)-based recommender systems. However, there are some important scenarios where ML systems are deficient. Each of the following scenarios involves a situation where we wish to train an ML system so that it delivers a service. In each case, however, there is an important constraint that must be imposed on the operation of the ML system.
Scenario 1: We want a system that will match submitted job applications to our list of academic vacancies. The system has to be non-discriminatory to minority groups.
Scenario 2: We need an automated cancer diagnosis system based on biopsy images. We also have HIV test results, which can be used at training time but should not be collected from our new patients.
Scenario 3: We wish to have a system that can aid us in deciding whether or not to approve a mortgage application. We need to understand the decision process and relate it to our checklist such as whether or not the applicant has an overdraft in the last three months and is on electoral roll.
Scenario 1 asks an ML system to be fair in its decisions by being non-discriminatory with regards to, e.g., race, gender, and disability; scenario 2 requires an ML system to protect confidentiality of personal sensitive data; and scenario 3 demands transparency from an ML system by providing human-understandable decisions.
Equipping ML models with ethical and legal constraints, scenarios 1-3, is a serious issue; without this, the future of ML is at risk. In the UK, this is recognized by the House of Commons Science and Technology Committee, which recommended an urgent formation of a Council of Data Ethics ("The Big Data Dilemma" report, 2016). Furthermore, since 2015, the Royal Society has started a policy project that looks at the social, legal, and ethical challenges associated with advancement in ML models and their use cases.
Building ML models with fairness, confidentiality, and transparency constraints is an active research area, and disjoint frameworks are available for addressing each constraint. However, how to put them all together is not obvious. My long-term goal is to develop an ML framework with plug-and-play constraints that is able to handle any of the mentioned constraints, their combinations, and also new constraints that might be stipulated in the future.
The proposed ML framework relies on instantiating ethical and legal constraints as privileged information. This privileged information is available at training time to better train a decision model and to make a decision model non-discriminatory, but it will not be accessible for future data at deployment time. For confidentiality constraints, personal confidential data such as HIV test results are the privileged information. For fairness constraints, protected characteristics such as race and gender are the privileged information. For transparency constraints, complex un-interpretable but highly discriminative features such as deep learning features are the privileged information.
This project aims to develop an ML framework that produces accurate predictions and uncertainty estimates about its predictions while also complying with ethical and legal constraints. The key contributions of this proposal are: 1) a new privileged learning algorithm that overcomes limitations of existing methods by allowing to plug-and-play various constraints at deployment time, by being kernelized, by optimizing its hyperparameters, and by producing estimates of prediction uncertainty, 2) a scalable and automated inference that makes the new privileged learning algorithm easily applicable for any large scale learning problem such as binary classification, multi-class classification, and regression, and 3) an instantiation of the new algorithm for incorporating fairness, confidentiality, and transparency restrictions into ML models.
|