Blossom Metevier

I am currently pursuing an M.S./Ph.D. at the University of Massachusetts under the guidance of Phil Thomas. My research in machine learning centers on addressing problems related to sequential learning, with specific interests in the following areas:

  • Responsible AI, with an emphasis on fairness considerations.
  • Interactive learning, including bandits and reinforcement learning.
  • Reliable ML, with a focus on high-confidence guarantees on algorithmic performance and behavior.
  • Integrative learning, including methods that learn and adapt to human decision-makers.
Prior to joining UMass, I was a member of the Force Projection Sector at APL. I earned my Bachelor's degree in Computer Science and Mathematics from the University of Maryland Baltimore County, where I also competed as a track & field athlete. Throughout my undergraduate career, I was fortunate to receive mentorship from both Professor Marie desJardins and Coach David Bobb.


Contact information: bmetevier [at] umass [dot] edu

Blossom

Recent News

FAccT 2024 Acceptance. Our paper on Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics has been accepted at FAccT.


Fall 2022 Internship. I worked on the Responsible AI Team at Facebook AI Research.


Summer 2022 Internship. I worked with Nicolas Le Roux at MSR FATE Montréal.


ICLR 2022 Acceptance. Our paper on Fairness Guarantees under Demographic Shift has been accepted at ICLR.


Summer 2021 Internship. I worked with Dennis Wei and Karthi Ramamurthy in the Trustworthy AI group at IBM.


Publications


Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics
Min-Hsuan Yeh, Blossom Metevier, Austin Hoag, Philip S. Thomas
ACM Conference on Fairness, Accountability, and Transparency (FAccT 2024)

Abstract

In research studying the fairness of machine learning algorithms and models, fairness often means that a metric is the same when computed for two different groups of people. For example, one might define fairness to mean that the false positive rate of a classifier is the same for people of different genders, ages, or races. However, it is usually not possible to make this metric identical for all groups. Instead, algorithms ensure that the metric is similar---for example, that the false positive rates are similar. Researchers usually measure this similarity or dissimilarity using either the difference or ratio between the metric values for different groups of people. Although these two approaches are known to be different, there has been little work analyzing their differences and respective benefits. In this paper we examine this relationship analytically and empirically, and conclude that unless there are application-specific reasons to prefer the difference approach, the ratio approach should be preferred.


Matched Pair Calibration for Ranking Fairness
Hannah Korevaar, Chris McConnell, Edmund Tong, Erik Brinkman, Alana Shine, Misam Abbas, Blossom Metevier, Sam Corbett-Davies, Khalid El-Arini
In Submission

Abstract | Arxiv

We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.


Enforcing Delayed-Impact Fairness Guarantees
Aline Weber*, Blossom Metevier*, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva*Equal contribution
In Submission

Abstract | Arxiv

Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e.g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term. This is because prior fairness-aware algorithms only consider static fairness constraints, such as equal opportunity or demographic parity. However, enforcing constraints of this type may result in models that have negative long-term impact on disadvantaged individuals and communities. We introduce ELF (Enforcing Long-term Fairness), the first classification algorithm that provides high-confidence fairness guarantees in terms of long-term, or delayed, impact. We prove that the probability that ELF returns an unfair solution is less than a user-specified tolerance and that (under mild assumptions), given sufficient training data, ELF is able to find and return a fair solution if one exists. We show experimentally that our algorithm can successfully mitigate long-term unfairness.


Fairness Guarantees under Demographic Shift
Stephen Giguere, Blossom Metevier, Yuriy Brun, Philip S. Thomas
Tenth International Conference on Learning Representations (ICLR 2022)

Abstract | Paper

Recent studies found that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes. To address this challenge, recent machine learning algorithms have been designed to limit the likelihood such unfair behavior occurs. However, these approaches typically assume the data used for training is representative of what will be encountered in deployment, which is often untrue. In particular, if certain subgroups of the population become more or less probable in deployment (a phenomenon we call demographic shift), prior work’s fairness assurances are often invalid. In this paper, we consider the impact of demographic shift and present a class of algorithms, called Shifty algorithms, that provide high-con- fidence behavioral guarantees that hold under demographic shift when data from the deployment environment is unavailable during training. Shifty, the first technique of its kind, demonstrates an effective strategy for designing algorithms to overcome demographic shift’s challenges. We evaluate Shifty using the UCI Adult Census dataset (Kohavi and Becker, 1996), as well as a real-world dataset of university entrance exams and subsequent student success. We show that the learned models avoid bias under demographic shift, unlike existing methods. Our experiments demonstrate that our algorithm’s high-confidence fairness guarantees are valid in practice and that our algorithm is an effective tool for training models that are fair when demographic shift occurs.


Reinforcement Learning When All Actions are Not Always Available
Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas
Thirty-fourth Conference on Artificial Intelligence (AAAI 2020)

Abstract | Arxiv

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs suffer from divergence issues, and present new algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches using several tasks inspired by real-life use cases wherein the action set is stochastic.


Offline Contextual Bandits with High Probability Fairness Guarantees
Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas
Advances in Neural Information Processing Systems (NeurIPs 2019)

Abstract | Paper

We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Unlike previous work, our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-specified threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness definitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness definitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other offline and online contextual bandit algorithms.


Lexicase Selection Beyond Genetic Programming
Blossom Metevier, Anil Saini, Lee Spector
Genetic Programming Theory and Practice XVI (GPTP 2019)

Personal

Apart from the grad-student grind, I enjoy running, reading, and badminton. I also like playing with my cats (featured here) and dog!