Evaluating Fairness of Ranking Algorithms

Mitigating bias in hiring algorithms: A Fairness Analysis. LANS 2022


Existing hiring algorithms often claim to be "unbiased" but typically aim to meet basic Equal Employment Opportunity Commission (EEOC) requirements. Even when they meet these standards, they may still exhibit discriminatory behavior when interacting with hiring managers. This study investigates the inherent biases in hiring algorithms, with a focus on the efficacy of mitigating bias by removing gender, race, and class identifiers from the ranking process. Two forms of discrimination are assessed: disparate treatment and disparate impact, commonly measured by the "4/5" rule.

In this research, four ranking algorithms are analyzed, with a specific focus on Themis-ml: a fairness-aware post-processing machine learning algorithm . Four training models are evaluated using Themis-ml, employing a protected attribute (gender) and training data from the German Credit Score dataset. These models include a Baseline (B) classifier trained on all input variables, a Remove Protected Attribute (RPA) classifier that excludes protected attributes, a Reject-Option Classification (ROC) classifier, and an Additive Counterfactually Fair Model (ACF) classifier.

We compared the percentage of men and women classified as low-risk for a loan and examines the utility effectiveness by checking if the AUC value remains constant. The findings indicate that men (the unprotected group) are 12% more likely to be labeled as low risk. Notably, PRA and B models show no significant change in the distribution between gender groups. However, the ACF model reduces the disparity between gender groups, while the ROC model surprisingly favors women being labeled as low risk.

All four training models maintain the utility AUC value, ensuring that fairness improvements do not compromise the algorithms' effectiveness. However, we found that merely removing identifiers associated with attributes like gender, race, or class does not suffice to improve the fairness of ranking results. The research emphasizes the need for real-life evaluations to achieve better representation for marginalized groups.


Click here for a full-screen view.