24 Selection Bias and Predicting Performance and Turnover

24.1 Why Bias and Prediction Belong Together

A selection model that predicts performance well but produces unfair outcomes will eventually be challenged; a model that produces fair outcomes but does not predict performance will not survive the next budget review.

The two questions in this chapter — selection bias and predicting performance and turnover — are usually treated as separate topics in HR-analytics curricula. They belong together because the same model is judged on both. The function that designs and operates a selection model is asked simultaneously whether the model produces fair outcomes across demographic and other protected groups, and whether it actually predicts the outcomes the firm cares about. The two answers cannot be sought sequentially; the model has to be evaluated on both in parallel, and the dashboard has to surface both.

The framing has become sharper as analytical models have moved from human-rated structured interviews to algorithm-driven scoring. As Robert E. Ployhart & Brian C. Holtz (2008) set out in their influential review of the diversity-validity dilemma, every selection method involves trade-offs between predictive validity and adverse impact, and the responsible function approaches those trade-offs deliberately rather than by accident. The newer machine-learning models do not abolish the dilemma; they often sharpen it, because their opacity makes adverse-impact patterns harder to detect and easier to dismiss as artefacts of the data. The discipline this chapter describes applies equally to a regression model, a structured-interview rubric, and a contemporary algorithmic scoring system.

The prediction side of the chapter rests on a long evidence base. As Peter W. Hom et al. (2017) documented in their century-spanning review of turnover research, the workforce-prediction questions a firm faces — who will perform well, who will leave, who will succeed in a different role — have accumulated more empirical attention than almost any other HR question, and the evidence supports specific methods rather than generalised intuition. The discipline is to treat prediction as an analytical problem with its own quality criteria, not as a magical capability the model brings.

The visualisation lens carries both halves. A bias chart is a subgroup comparison with the comparison group, sample size, and statistical significance rendered visibly. A prediction chart is a model-output visualisation with calibration, lift, and confidence rendered visibly. The page that surfaces both for the same model is the page that earns the function the right to use the model in production.

The bias-and-prediction contract

Every selection model is evaluated on predictive validity and on adverse-impact metrics in parallel, and both are surfaced on the dashboard for every cycle.
The dashboard renders the model’s calibration — predicted versus realised outcomes — alongside its discrimination — how well it separates higher and lower performers — so the audience reads prediction quality at a glance.
Bias measurements use established statistical and legal definitions, with sample sizes and confidence intervals visible. A subgroup comparison without those companions is reporting, not evaluation.

24.2 Selection Bias: Concepts and Measurement

Selection bias has multiple working definitions, and a credible evaluation programme distinguishes them. Three definitions recur most often: adverse impact, predictive bias, and procedural fairness. Each has its own measurement, its own threshold, and its own remediation pattern.

Three Working Definitions of Selection Bias

Definition	What it asks	Measurement	Typical threshold
Adverse impact	Are pass-through rates equal across protected groups	Selection-ratio comparison or four-fifths rule	Selection ratio below 0.80 raises a flag
Predictive bias	Does the model predict outcomes equally well across groups	Subgroup regression slopes and intercepts	Significant slope or intercept differences raise a flag
Procedural fairness	Do candidates perceive the process as fair	Candidate-experience surveys, due-process measures	Comparable scores across groups

The diversity-validity dilemma

Most predictive selection methods produce some adverse impact across demographic groups, and reducing the impact often reduces the predictive validity. As Robert E. Ployhart & Brian C. Holtz (2008) set out, the responsible function does not pretend the trade-off does not exist. It surfaces the trade-off explicitly and chooses combinations that are defensible across both dimensions. The dashboard is where that choice is rendered. A combination that scores well on predictive validity but produces an adverse-impact ratio of 0.6 has to be redesigned; a combination that achieves perfect parity but predicts no better than chance has not been designed at all.

Strategies for reducing adverse impact without losing validity

Several documented strategies reduce adverse impact while preserving predictive validity: using validated work-sample tests in place of untimed cognitive tests, structuring interviews tightly, weighting components based on empirical incremental validity, and applying score banding within statistically equivalent ranges. Each strategy has boundary conditions and is supported by evidence. The dashboard names the strategies in use, the impact each has had on the adverse-impact ratio, and the impact each has had on predictive validity, so that the audience can see the trade-off being managed rather than asserted.

24.3 Predicting Performance

Predicting performance is the headline use of selection models, and it has its own quality criteria. A model that ranks candidates correctly is a model with high discrimination. A model whose predicted scores match realised outcomes on average is a model with high calibration. The two are different, and a credible evaluation surfaces both.

Quality Criteria for a Performance-Prediction Model

Criterion	What it asks	Visualisation
Discrimination	Does the model rank candidates correctly	ROC curve, lift chart, top-decile precision
Calibration	Do predicted scores match realised outcomes	Calibration plot, predicted-versus-realised chart
Stability	Does the model perform similarly across cycles	Longitudinal validation chart
Subgroup robustness	Does the model perform similarly across groups	Subgroup discrimination and calibration panels
Decision utility	Does the model improve hiring outcomes net of cost	Utility chart with realised gain

Calibration and the prediction-realisation chart

The single most useful visual for a performance-prediction model is the calibration plot: predicted score on the x-axis, realised outcome on the y-axis, with the perfect-calibration line drawn as a reference. A model whose points cluster around the diagonal is a model that says what it means. A model whose points systematically deviate is a model whose predictions need to be re-scaled or whose scope needs to be restricted. The dashboard renders the calibration plot alongside discrimination metrics so the audience reads both at once.

24.4 Predicting Turnover

Turnover prediction is one of the most-requested HR-analytics deliverables and one of the most-misused. The challenge is not the model. It is the action that follows the model. A function that predicts attrition without a defensible plan for what to do with the prediction risks producing self-fulfilling forecasts, surveillance creep, and decisions that the workforce experiences as unfair.

The Turnover-Prediction Pipeline

Stage	What it does	Visualisation
Risk scoring	Assigns each employee a probability of voluntary exit	Risk distribution chart with confidence band
Segmentation	Groups employees by risk and by criticality	Heat map of risk versus role criticality
Driver analysis	Identifies the factors that move the risk score	Driver chart with effect sizes
Action design	Specifies what action will be taken for each segment	Action-by-segment panel
Outcome tracking	Compares predicted exits with realised exits	Calibration panel and back-test

The arc of a turnover-prediction programme

flowchart LR
  A[Risk Scoring] --> B[Segmentation<br/>by risk and criticality]
  B --> C[Driver Analysis]
  C --> D[Action Design]
  D --> E[Outcome Tracking]
  E --> A
  style A fill:#E8F0FE,stroke:#1A73E8
  style D fill:#FEF7E0,stroke:#F9AB00
  style E fill:#E6F4EA,stroke:#137333

The pipeline closes the loop when outcome tracking feeds the next cycle’s risk scoring. As Peter W. Hom et al. (2017) emphasise across their century-spanning review of turnover research, the most credible programmes are those that learn from each cycle’s calibration and update their scoring accordingly, rather than treating the model as fixed and the outcome as a verdict on the model.

Action discipline in turnover prediction

The hardest part of turnover prediction is the action. A high-risk score for a high-criticality role implies one kind of intervention; a high-risk score for a low-criticality role implies a different one — sometimes none at all. The dashboard’s role is to make the action policy explicit and the action history auditable: what segment received what intervention, and what happened next. Without this discipline, turnover prediction becomes a list of names that the firm does not know what to do with.

24.5 Visualising Bias and Prediction Together

A credible selection-and-prediction dashboard surfaces bias and prediction on the same surface so that the function can defend its model on both dimensions in the same conversation. Five design choices, applied consistently, hold the two together.

Five Design Choices for the Bias-and-Prediction Dashboard

Choice	What it does on the page
Adverse-impact ratio panel	Subgroup pass-through ratios are surfaced with confidence intervals
Calibration plot	Predicted versus realised outcomes are shown for every model
Subgroup robustness panel	Discrimination and calibration are stratified by subgroup
Action-by-segment panel	The intervention policy is named for each risk-and-criticality segment
Cycle-over-cycle calibration	Predicted-versus-realised outcomes are tracked across cycles

The dashboard as the evidence file

The bias-and-prediction dashboard is, in operation, the evidence file for the selection programme. When a regulator asks whether the firm’s selection or attrition-prediction model has been evaluated for fairness, the dashboard answers. When a hiring manager asks whether the model’s recommendations have produced better hires, the dashboard answers. When a board member asks whether the function is keeping up with the standards the firm has committed to, the dashboard answers. Build the dashboard for the audit and it serves the daily work; build it for the daily work alone and it will not survive the audit.

24.6 Hands-On Exercise: Adverse Impact and Turnover Prediction

Aim, Scenario, Dataset, Deliverable

Aim. Compute adverse-impact ratios for a workforce decision and build a turnover-prediction model with calibration and subgroup-robustness panels, surfacing both on a single Power BI page.

Scenario. You are running the bias-and-prediction analytics for an organisation that wants the same model judged on both fairness and predictive quality. The page has to defend the model in front of a regulator, a hiring manager, and a board member, each opening it for a different question.

Dataset. The IBM HR Analytics Employee Attrition dataset, available publicly on Kaggle at www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset. Use Attrition as the target, with Age, Department, JobLevel, JobSatisfaction, OverTime, MonthlyIncome, YearsAtCompany, and Gender as predictors and subgroup variables.

Deliverable. A Bias-and-Prediction.xlsx workbook with the adverse-impact and prediction calculations, plus a Bias-and-Prediction.pbix Power BI file with the dashboard described below.

24.6.1 Step 1 — Compute adverse-impact ratios

Treat Attrition = "Yes" as the negative outcome the model is being evaluated against. Compute the ratio of attrition rates across Gender.

Code

Excel Formula

Attrition Rate (Female) = COUNTIFS(HR[Gender], "Female", HR[Attrition], "Yes")
                        / COUNTIF(HR[Gender], "Female")
Attrition Rate (Male)   = COUNTIFS(HR[Gender], "Male", HR[Attrition], "Yes")
                        / COUNTIF(HR[Gender], "Male")
Adverse Impact Ratio    = MIN(Female_Rate, Male_Rate) / MAX(Female_Rate, Male_Rate)

Apply the four-fifths rule: a ratio below 0.80 raises a flag. Repeat the calculation for Department and JobLevel.

24.6.2 Step 2 — Build a logistic regression for attrition risk

Use Excel’s Solver or the LOGEST function as a workshop substitute for full logistic regression. For a teaching-grade lab, fit a linear-probability model with the Data Analysis ToolPak’s Regression function.

Code

Excel Formula

=LINEST(HR[AttritionBinary], HR[[Age]:[OverTimeBinary]], TRUE, TRUE)

The regression returns coefficients for each predictor. Compute predicted attrition probability for each employee.

Code

Excel Formula

Predicted Risk = INTERCEPT + SUMPRODUCT(Coefficients, EmployeePredictors)

24.6.3 Step 3 — Build the calibration plot

Bin predicted risk into deciles. For each decile, compute the average predicted risk and the actual attrition rate. Plot predicted (x-axis) against realised (y-axis) and add the perfect-calibration diagonal as a reference.

24.6.4 Step 4 — Build the ROC curve

Sort employees by predicted risk descending. For each threshold, compute the true-positive rate and false-positive rate.

Code

Excel Formula

TPR (at threshold)  = COUNTIFS(Predicted, ">=t", Actual, "Yes") / COUNTIF(Actual, "Yes")
FPR (at threshold)  = COUNTIFS(Predicted, ">=t", Actual, "No")  / COUNTIF(Actual, "No")

Render the ROC curve and compute AUC using the trapezoidal rule.

24.6.5 Step 5 — Compute subgroup discrimination and calibration

Repeat Steps 3 and 4 separately for the Female and Male subgroups. Compute AUC for each subgroup. A material gap (more than 0.05 in AUC) is a fairness flag.

24.6.6 Step 6 — Build the risk-and-criticality segmentation

Add a Criticality lookup based on JobLevel (treat levels 4 and 5 as critical). Create a heat map of risk versus criticality with the firm’s intervention policy named for each cell:

High risk and high criticality: targeted retention plan, escalation.
High risk and low criticality: light-touch action.
Low risk and high criticality: succession monitoring.
Low risk and low criticality: no action.

24.6.7 Step 7 — Promote to Power BI and build the bias-and-prediction page

Load the data into Power BI. Build the adverse-impact panel with subgroup ratios, the calibration plot with the diagonal reference, the ROC curve with AUC labelled, the subgroup robustness panel, the risk-by-criticality heat map, and the action-by-segment policy table.

24.6.8 Step 8 — Add the cycle-over-cycle calibration placeholder

Reserve a panel that pre-populates the current cycle’s predicted attrition rate and the realised rate when the next cycle’s data arrives. Wire the page so the panel grows across future cycles into the longest-running visual on the dashboard.

24.6.9 Step 9 — Publish

Publish the report and tag it as the bias-and-prediction evidence file. Confirm that the page is opened in every selection-programme and retention-programme review.

Connect to the Visualisation Layer

This page extends the recruitment funnel of Chapter 22 and the validity dashboard of Chapter 23 into the joint bias-and-prediction view. The page also feeds the optimisation calculations of Chapter 28, where the predicted-risk distribution informs differential retention investment.

Files and Screen Recordings

Bias-and-Prediction.xlsx, Bias-and-Prediction.pbix, and ch24-bias-and-prediction-walkthrough.mp4 will be attached at this point in the published edition. The screen recording walks through Steps 1 to 9 with the Excel adverse-impact and regression workbench and the Power BI bias-and-prediction page shown side by side.

Summary

Concept	Description
Why Bias and Prediction Belong Together
Bias and prediction belong together	The same selection model is judged on bias and on prediction; both belong on the dashboard
Trade-offs surfaced explicitly	Responsible function surfaces the predictive-validity-versus-adverse-impact trade-off
Algorithmic models do not abolish bias	Algorithmic models often sharpen the bias dilemma rather than abolish it
Action discipline as the hard part	The hardest part of turnover prediction is what the firm does with the prediction
Calibration alongside discrimination	A model with high discrimination but poor calibration is not yet usable
Selection Bias
Adverse impact	Pass-through rates compared across protected groups, often with the four-fifths rule
Predictive bias	Whether the model predicts outcomes equally well across groups, by slope and intercept
Procedural fairness	Whether candidates perceive the process as fair across groups
Diversity-validity dilemma	Most predictive methods produce some adverse impact; reducing it can reduce validity
Strategies for reducing adverse impact	Work samples, structured interviews, weighting, and banding can reduce impact while preserving validity
Predicting Performance
Discrimination criterion	Does the model rank candidates correctly, captured by ROC and lift
Calibration criterion	Do predicted scores match realised outcomes, captured by calibration plot
Stability criterion	Does the model perform similarly across cycles
Subgroup robustness criterion	Does the model perform similarly across protected groups
Decision-utility criterion	Does the model improve hiring outcomes net of cost
Calibration plot	Predicted score on x-axis, realised outcome on y-axis, perfect-calibration diagonal
Predicted-versus-realised chart	Single most useful visual for a performance-prediction model
Predicting Turnover
Risk scoring	Each employee receives a probability of voluntary exit with confidence
Risk-and-criticality segmentation	Employees grouped by risk and by role criticality
Driver analysis	Identifies the factors that move the risk score with effect sizes
Action design by segment	Specifies what action will be taken for each risk-and-criticality segment
Outcome tracking	Compares predicted exits with realised exits cycle by cycle
Cycle-over-cycle calibration	Calibration tracked over cycles is the strongest credibility signal
Action discipline in turnover prediction	Action policy explicit and action history auditable for each segment
Visualising Both Together
Adverse-impact ratio panel	Subgroup pass-through ratios surfaced with confidence intervals
Subgroup robustness panel	Discrimination and calibration stratified by subgroup on the page
Action-by-segment panel	Intervention policy named for each risk-and-criticality segment
Cycle-over-cycle calibration panel	Predicted-versus-realised outcomes tracked across cycles
Operational Disciplines
Self-fulfilling forecast risk	A prediction without a defensible action plan can produce its own outcome
Dashboard as the evidence file	The dashboard is the evidence file for regulators, managers, and the board