Blossom Metevier

Blossom

I am an MS/PhD student in the Autonomous Learning Lab at the University of Massachusetts Amherst, where I am advised by Phil Thomas. I study ways to ensure safety and fairness in reinforcement learning and bandits. Specifically, I am interested in designing (practical) algorithms with guarantees on behavior and performance. My interests include reinforcement learning, bandits, safety & fairness, machine learning, and gentetic programming.


Before attending UMass, I was a member of the Force Projection Sector at APL. I completed my undergraduate degree in Computer Science from University of Maryland Baltimore County, where I also competed as a track & field athlete. Throughout my undergraduate career, I was fortunate enough to be mentored by Marie desJardins and coach David Bobb.


Contact information: bmetevier [at] cs [dot] umass [dot] edu

Publications


Reinforcement Learning When All Actions are Not Always Available
Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip Thomas
Thirty-fourth Conference on Artificial Intelligence (AAAI 2020)

Abstract | Arxiv

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs suffer from divergence issues, and present new algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches using several tasks inspired by real-life use cases wherein the action set is stochastic.

Offline Contextual Bandits with High Probability Fairness Guarantees
Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip Thomas
Advances in Neural Information Processing Systems (NeurIPs 2019)

Abstract

We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Unlike previous work, our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-specified threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness definitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness definitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other offline and online contextual bandit algorithms.


High-Probability Guarantees for Offline Contextual Bandits
Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip Thomas
(Poster and Spotlight Presentation) 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019)

Abstract

We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Unlike previous work, our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-specified threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness definitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness definitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other offline and online contextual bandit algorithms.


Lexicase Selection Beyond Genetic Programming
Blossom Metevier, Anil Saini, Lee Spector
Genetic Programming Theory and Practice XVI (GPTP 2019)

Personal

Apart from the grad student grind, I enjoy running, reading, and badminton. I love Hemmingway's theory of omission and Ursula Guin's alternative worlds. I am also a fan of the DC universe, specifically the Teen Titans, and I follow a number of Japanese comics.