Date of Award
8-2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
School of Computing
Committee Chair/Advisor
Dr. Brian Dean
Committee Member
Dr. Christopher McMahan
Committee Member
Dr. Alexander Feltus
Committee Member
Dr. Alexander Alekseyenko
Committee Member
Dr. Nina Hubig
Abstract
Most modern machine learning algorithms tend to focus on an "average-case" approach, where every data point contributes the same amount of influence towards calculating the fit of a model. This "per-data point" error (or loss) is averaged together into an overall loss and typically minimized with an objective function. However, this can be insensitive to valuable outliers. Inspired by game theory, the goal of this work is to explore the utility of incorporating an optimally-playing adversary into feature selection and regression frameworks. The adversary assigns weights to the data elements so as to degrade the modeler's performance in an optimal manner, thereby forcing the modeler to construct a more robust solution. A tuning parameter enables "tempering" of the power wielded by the adversary, allowing us to explore the spectrum between average case and worst case. By formulating our method as a linear program, it can be solved efficiently, and can accommodate sub-population constraints, a feature that other related methods cannot easily implement. We feel that the need to generate models while understanding the influence of sub-population constraints should be particularly prominent in biomedical literature, and though our method was developed in response to the ubiquity of sub-population data and outliers that exist in this realm, our method is generic and can be applied to data sets that are not exclusively biomedical in nature. We additionally explore the implementation of our method as an adversarial regression problem. Here, instead of providing the user with a fitting of parameters for the model, we provide the user with an ensemble of parameters which can be tuned based on sensitivity to outliers and various sub-population constraints. Finally, to help foster a better understanding of various data sets, we will discuss potential automated applications of our method which will enable data scientists to explore underlying relationships and sensitivities that may be a consequence of sub-populations and meaningful outliers.
Recommended Citation
McGee, Stephen, "Tempering the Adversary: An Exploration into the Applications of Game Theoretic Feature Selection and Regression" (2022). All Dissertations. 3111.
https://open.clemson.edu/all_dissertations/3111