Blossom Metevier

I am an M.S./Ph.D. student in the Autonomous Learning Lab at the University of Massachusetts, where Phil Thomas advises me. I study ways to ensure safety and fairness in reinforcement learning and bandits. Specifically, I am interested in designing (practical) machine learning algorithms with guarantees on behavior and performance.


Before attending UMass, I was a member of the Force Projection Sector at APL. I completed my undergraduate degree in Computer Science from the University of Maryland Baltimore County, where I also competed as a track & field athlete. Throughout my undergraduate career, I was mentored by Marie desJardins and coach David Bobb.


Contact information: bmetevier [at] umass [dot] edu

Blossom

Recent News

Fall 2022 Internship. I will work on the Responsible AI Team at Facebook AI Research.


Summer 2022 Internship. I will work with Nicolas Le Roux at MSR FATE Montréal.


ICLR 2022 Acceptance. Our paper on Fairness Guarantees under Demographic Shift has been accepted at ICLR.


Summer 2021 Internship. I worked with Dennis Wei and Karthi Ramamurthy in the Trustworthy AI group at IBM.


NERDS 2020 Organizer. Scott Jordan and I organized the first Northeast Reinforcement Learning and Decision Making Symposium (NERDS).


Publications


Enforcing Delayed-Impact Fairness Guarantees
Aline Weber*, Blossom Metevier*, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva*Equal contribution
In Submission

Abstract

Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples’ lives or well-being (e.g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long-term. This is because prior fairness-aware algorithms only consider static fairness constraints, such as equal opportunity or demographic parity. However, enforcing constraints of this type may result in models that have negative delayed impact on disadvantaged individuals and communities. We introduce ELF (Enforcing Long-term Fairness), the first algorithm that provides high-confidence fairness guarantees in terms of delayed impact. We prove that ELF will not return an unfair solution with probability greater than a user-specified tolerance. Furthermore, we show (under mild assumptions) that given sufficient training data, ELF is able to find and return a fair solution if one exists. We show experimentally that our algorithm can successfully mitigate long-term unfairness.


Fairness Guarantees under Demographic Shift
Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas
Tenth International Conference on Learning Representations (ICLR 2022)

Abstract

Recent studies have demonstrated that using machine learning for social applications can result in racist, sexist, and otherwise unfair and discriminatory outcomes, which can lead to social injustice. While machine learning algorithms exist that provide assurances that unfair behavior does not take place, these approaches typically assume that the data used for training is representative of what will be encountered once the model is deployed. In particular, if certain subgroups of the population are more or less probable after the model is deployed (a phenomenon we call demographic shift), the guarantees provided by prior algorithms are often violated. In this paper, we consider the impact of demographic shift and present Shifty, a new algorithm that provides high-confidence behavioral guarantees despite this shift. We then evaluate Shifty on a real-world data set of university entrance exams and subsequent student success, and demonstrate that Shifty avoids bias, even when the student distribution changes after training, while existing methods exhibit bias. Our experiments demonstrate that the high-confidence fairness guarantees of our algorithm are valid in practice, and that our algorithm is an effective tool for training models that are fair when demographic shift occurs.


Reinforcement Learning When All Actions are Not Always Available
Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas
Thirty-fourth Conference on Artificial Intelligence (AAAI 2020)

Abstract | Arxiv

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs suffer from divergence issues, and present new algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches using several tasks inspired by real-life use cases wherein the action set is stochastic.


Offline Contextual Bandits with High Probability Fairness Guarantees
Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas
Advances in Neural Information Processing Systems (NeurIPs 2019)

Abstract

We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Unlike previous work, our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-specified threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness definitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness definitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other offline and online contextual bandit algorithms.


Lexicase Selection Beyond Genetic Programming
Blossom Metevier, Anil Saini, Lee Spector
Genetic Programming Theory and Practice XVI (GPTP 2019)

Personal

Apart from the grad-student grind, I enjoy running, reading, and badminton. I also like playing with my cat (featured on the right) and dog!