24  Selection Bias and Predicting Performance and Turnover

24.1 Why Bias and Prediction Belong Together

A selection model that predicts performance well but produces unfair outcomes will eventually be challenged; a model that produces fair outcomes but does not predict performance will not survive the next budget review.

The two questions in this chapter — selection bias and predicting performance and turnover — are usually treated as separate topics in HR-analytics curricula. They belong together because the same model is judged on both. The function that designs and operates a selection model is asked simultaneously whether the model produces fair outcomes across demographic and other protected groups, and whether it actually predicts the outcomes the firm cares about. The two answers cannot be sought sequentially; the model has to be evaluated on both in parallel, and the dashboard has to surface both.

The framing has become sharper as analytical models have moved from human-rated structured interviews to algorithm-driven scoring. As Robert E. Ployhart & Brian C. Holtz (2008) set out in their influential review of the diversity-validity dilemma, every selection method involves trade-offs between predictive validity and adverse impact, and the responsible function approaches those trade-offs deliberately rather than by accident. The newer machine-learning models do not abolish the dilemma; they often sharpen it, because their opacity makes adverse-impact patterns harder to detect and easier to dismiss as artefacts of the data. The discipline this chapter describes applies equally to a regression model, a structured-interview rubric, and a contemporary algorithmic scoring system.

The prediction side of the chapter rests on a long evidence base. As Peter W. Hom et al. (2017) documented in their century-spanning review of turnover research, the workforce-prediction questions a firm faces — who will perform well, who will leave, who will succeed in a different role — have accumulated more empirical attention than almost any other HR question, and the evidence supports specific methods rather than generalised intuition. The discipline is to treat prediction as an analytical problem with its own quality criteria, not as a magical capability the model brings.

The visualisation lens carries both halves. A bias chart is a subgroup comparison with the comparison group, sample size, and statistical significance rendered visibly. A prediction chart is a model-output visualisation with calibration, lift, and confidence rendered visibly. The page that surfaces both for the same model is the page that earns the function the right to use the model in production.

TipThe bias-and-prediction contract
  1. Every selection model is evaluated on predictive validity and on adverse-impact metrics in parallel, and both are surfaced on the dashboard for every cycle.
  2. The dashboard renders the model’s calibration — predicted versus realised outcomes — alongside its discrimination — how well it separates higher and lower performers — so the audience reads prediction quality at a glance.
  3. Bias measurements use established statistical and legal definitions, with sample sizes and confidence intervals visible. A subgroup comparison without those companions is reporting, not evaluation.

24.2 Selection Bias: Concepts and Measurement

Selection bias has multiple working definitions, and a credible evaluation programme distinguishes them. Three definitions recur most often: adverse impact, predictive bias, and procedural fairness. Each has its own measurement, its own threshold, and its own remediation pattern.

TipThree Working Definitions of Selection Bias
Definition What it asks Measurement Typical threshold
Adverse impact Are pass-through rates equal across protected groups Selection-ratio comparison or four-fifths rule Selection ratio below 0.80 raises a flag
Predictive bias Does the model predict outcomes equally well across groups Subgroup regression slopes and intercepts Significant slope or intercept differences raise a flag
Procedural fairness Do candidates perceive the process as fair Candidate-experience surveys, due-process measures Comparable scores across groups
TipThe diversity-validity dilemma

Most predictive selection methods produce some adverse impact across demographic groups, and reducing the impact often reduces the predictive validity. As Robert E. Ployhart & Brian C. Holtz (2008) set out, the responsible function does not pretend the trade-off does not exist. It surfaces the trade-off explicitly and chooses combinations that are defensible across both dimensions. The dashboard is where that choice is rendered. A combination that scores well on predictive validity but produces an adverse-impact ratio of 0.6 has to be redesigned; a combination that achieves perfect parity but predicts no better than chance has not been designed at all.

TipStrategies for reducing adverse impact without losing validity

Several documented strategies reduce adverse impact while preserving predictive validity: using validated work-sample tests in place of untimed cognitive tests, structuring interviews tightly, weighting components based on empirical incremental validity, and applying score banding within statistically equivalent ranges. Each strategy has boundary conditions and is supported by evidence. The dashboard names the strategies in use, the impact each has had on the adverse-impact ratio, and the impact each has had on predictive validity, so that the audience can see the trade-off being managed rather than asserted.

24.3 Predicting Performance

Predicting performance is the headline use of selection models, and it has its own quality criteria. A model that ranks candidates correctly is a model with high discrimination. A model whose predicted scores match realised outcomes on average is a model with high calibration. The two are different, and a credible evaluation surfaces both.

TipQuality Criteria for a Performance-Prediction Model
Criterion What it asks Visualisation
Discrimination Does the model rank candidates correctly ROC curve, lift chart, top-decile precision
Calibration Do predicted scores match realised outcomes Calibration plot, predicted-versus-realised chart
Stability Does the model perform similarly across cycles Longitudinal validation chart
Subgroup robustness Does the model perform similarly across groups Subgroup discrimination and calibration panels
Decision utility Does the model improve hiring outcomes net of cost Utility chart with realised gain
TipCalibration and the prediction-realisation chart

The single most useful visual for a performance-prediction model is the calibration plot: predicted score on the x-axis, realised outcome on the y-axis, with the perfect-calibration line drawn as a reference. A model whose points cluster around the diagonal is a model that says what it means. A model whose points systematically deviate is a model whose predictions need to be re-scaled or whose scope needs to be restricted. The dashboard renders the calibration plot alongside discrimination metrics so the audience reads both at once.

24.4 Predicting Turnover

Turnover prediction is one of the most-requested HR-analytics deliverables and one of the most-misused. The challenge is not the model. It is the action that follows the model. A function that predicts attrition without a defensible plan for what to do with the prediction risks producing self-fulfilling forecasts, surveillance creep, and decisions that the workforce experiences as unfair.

TipThe Turnover-Prediction Pipeline
Stage What it does Visualisation
Risk scoring Assigns each employee a probability of voluntary exit Risk distribution chart with confidence band
Segmentation Groups employees by risk and by criticality Heat map of risk versus role criticality
Driver analysis Identifies the factors that move the risk score Driver chart with effect sizes
Action design Specifies what action will be taken for each segment Action-by-segment panel
Outcome tracking Compares predicted exits with realised exits Calibration panel and back-test
TipThe arc of a turnover-prediction programme

flowchart LR
  A[Risk Scoring] --> B[Segmentation<br/>by risk and criticality]
  B --> C[Driver Analysis]
  C --> D[Action Design]
  D --> E[Outcome Tracking]
  E --> A
  style A fill:#E8F0FE,stroke:#1A73E8
  style D fill:#FEF7E0,stroke:#F9AB00
  style E fill:#E6F4EA,stroke:#137333

The pipeline closes the loop when outcome tracking feeds the next cycle’s risk scoring. As Peter W. Hom et al. (2017) emphasise across their century-spanning review of turnover research, the most credible programmes are those that learn from each cycle’s calibration and update their scoring accordingly, rather than treating the model as fixed and the outcome as a verdict on the model.

TipAction discipline in turnover prediction

The hardest part of turnover prediction is the action. A high-risk score for a high-criticality role implies one kind of intervention; a high-risk score for a low-criticality role implies a different one — sometimes none at all. The dashboard’s role is to make the action policy explicit and the action history auditable: what segment received what intervention, and what happened next. Without this discipline, turnover prediction becomes a list of names that the firm does not know what to do with.

24.5 Visualising Bias and Prediction Together

A credible selection-and-prediction dashboard surfaces bias and prediction on the same surface so that the function can defend its model on both dimensions in the same conversation. Five design choices, applied consistently, hold the two together.

TipFive Design Choices for the Bias-and-Prediction Dashboard
Choice What it does on the page
Adverse-impact ratio panel Subgroup pass-through ratios are surfaced with confidence intervals
Calibration plot Predicted versus realised outcomes are shown for every model
Subgroup robustness panel Discrimination and calibration are stratified by subgroup
Action-by-segment panel The intervention policy is named for each risk-and-criticality segment
Cycle-over-cycle calibration Predicted-versus-realised outcomes are tracked across cycles
TipThe dashboard as the evidence file

The bias-and-prediction dashboard is, in operation, the evidence file for the selection programme. When a regulator asks whether the firm’s selection or attrition-prediction model has been evaluated for fairness, the dashboard answers. When a hiring manager asks whether the model’s recommendations have produced better hires, the dashboard answers. When a board member asks whether the function is keeping up with the standards the firm has committed to, the dashboard answers. Build the dashboard for the audit and it serves the daily work; build it for the daily work alone and it will not survive the audit.

24.6 Hands-On Exercise: Adverse Impact and Turnover Prediction

NoteAim, Scenario, Dataset, Deliverable

Aim. Compute adverse-impact ratios for a workforce decision and build a turnover-prediction model with calibration and subgroup-robustness panels, surfacing both on a single Power BI page.

Scenario. You are running the bias-and-prediction analytics for an organisation that wants the same model judged on both fairness and predictive quality. The page has to defend the model in front of a regulator, a hiring manager, and a board member, each opening it for a different question.

Dataset. The IBM HR Analytics Employee Attrition dataset, available publicly on Kaggle at www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset. Use Attrition as the target, with Age, Department, JobLevel, JobSatisfaction, OverTime, MonthlyIncome, YearsAtCompany, and Gender as predictors and subgroup variables.

Deliverable. A Bias-and-Prediction.xlsx workbook with the adverse-impact and prediction calculations, plus a Bias-and-Prediction.pbix Power BI file with the dashboard described below.

24.6.1 Step 1 — Compute adverse-impact ratios

Treat Attrition = "Yes" as the negative outcome the model is being evaluated against. Compute the ratio of attrition rates across Gender.

Code
Excel Formula
Attrition Rate (Female) = COUNTIFS(HR[Gender], "Female", HR[Attrition], "Yes")
                        / COUNTIF(HR[Gender], "Female")
Attrition Rate (Male)   = COUNTIFS(HR[Gender], "Male", HR[Attrition], "Yes")
                        / COUNTIF(HR[Gender], "Male")
Adverse Impact Ratio    = MIN(Female_Rate, Male_Rate) / MAX(Female_Rate, Male_Rate)

Apply the four-fifths rule: a ratio below 0.80 raises a flag. Repeat the calculation for Department and JobLevel.

24.6.2 Step 2 — Build a logistic regression for attrition risk

Use Excel’s Solver or the LOGEST function as a workshop substitute for full logistic regression. For a teaching-grade lab, fit a linear-probability model with the Data Analysis ToolPak’s Regression function.

Code
Excel Formula
=LINEST(HR[AttritionBinary], HR[[Age]:[OverTimeBinary]], TRUE, TRUE)

The regression returns coefficients for each predictor. Compute predicted attrition probability for each employee.

Code
Excel Formula
Predicted Risk = INTERCEPT + SUMPRODUCT(Coefficients, EmployeePredictors)

24.6.3 Step 3 — Build the calibration plot

Bin predicted risk into deciles. For each decile, compute the average predicted risk and the actual attrition rate. Plot predicted (x-axis) against realised (y-axis) and add the perfect-calibration diagonal as a reference.

24.6.4 Step 4 — Build the ROC curve

Sort employees by predicted risk descending. For each threshold, compute the true-positive rate and false-positive rate.

Code
Excel Formula
TPR (at threshold)  = COUNTIFS(Predicted, ">=t", Actual, "Yes") / COUNTIF(Actual, "Yes")
FPR (at threshold)  = COUNTIFS(Predicted, ">=t", Actual, "No")  / COUNTIF(Actual, "No")

Render the ROC curve and compute AUC using the trapezoidal rule.

24.6.5 Step 5 — Compute subgroup discrimination and calibration

Repeat Steps 3 and 4 separately for the Female and Male subgroups. Compute AUC for each subgroup. A material gap (more than 0.05 in AUC) is a fairness flag.

24.6.6 Step 6 — Build the risk-and-criticality segmentation

Add a Criticality lookup based on JobLevel (treat levels 4 and 5 as critical). Create a heat map of risk versus criticality with the firm’s intervention policy named for each cell:

  • High risk and high criticality: targeted retention plan, escalation.
  • High risk and low criticality: light-touch action.
  • Low risk and high criticality: succession monitoring.
  • Low risk and low criticality: no action.

24.6.7 Step 7 — Promote to Power BI and build the bias-and-prediction page

Load the data into Power BI. Build the adverse-impact panel with subgroup ratios, the calibration plot with the diagonal reference, the ROC curve with AUC labelled, the subgroup robustness panel, the risk-by-criticality heat map, and the action-by-segment policy table.

24.6.8 Step 8 — Add the cycle-over-cycle calibration placeholder

Reserve a panel that pre-populates the current cycle’s predicted attrition rate and the realised rate when the next cycle’s data arrives. Wire the page so the panel grows across future cycles into the longest-running visual on the dashboard.

24.6.9 Step 9 — Publish

Publish the report and tag it as the bias-and-prediction evidence file. Confirm that the page is opened in every selection-programme and retention-programme review.

TipConnect to the Visualisation Layer

This page extends the recruitment funnel of Chapter 22 and the validity dashboard of Chapter 23 into the joint bias-and-prediction view. The page also feeds the optimisation calculations of Chapter 28, where the predicted-risk distribution informs differential retention investment.

TipFiles and Screen Recordings

Bias-and-Prediction.xlsx, Bias-and-Prediction.pbix, and ch24-bias-and-prediction-walkthrough.mp4 will be attached at this point in the published edition. The screen recording walks through Steps 1 to 9 with the Excel adverse-impact and regression workbench and the Power BI bias-and-prediction page shown side by side.

Summary

Concept Description
Why Bias and Prediction Belong Together
Bias and prediction belong together The same selection model is judged on bias and on prediction; both belong on the dashboard
Trade-offs surfaced explicitly Responsible function surfaces the predictive-validity-versus-adverse-impact trade-off
Algorithmic models do not abolish bias Algorithmic models often sharpen the bias dilemma rather than abolish it
Action discipline as the hard part The hardest part of turnover prediction is what the firm does with the prediction
Calibration alongside discrimination A model with high discrimination but poor calibration is not yet usable
Selection Bias
Adverse impact Pass-through rates compared across protected groups, often with the four-fifths rule
Predictive bias Whether the model predicts outcomes equally well across groups, by slope and intercept
Procedural fairness Whether candidates perceive the process as fair across groups
Diversity-validity dilemma Most predictive methods produce some adverse impact; reducing it can reduce validity
Strategies for reducing adverse impact Work samples, structured interviews, weighting, and banding can reduce impact while preserving validity
Predicting Performance
Discrimination criterion Does the model rank candidates correctly, captured by ROC and lift
Calibration criterion Do predicted scores match realised outcomes, captured by calibration plot
Stability criterion Does the model perform similarly across cycles
Subgroup robustness criterion Does the model perform similarly across protected groups
Decision-utility criterion Does the model improve hiring outcomes net of cost
Calibration plot Predicted score on x-axis, realised outcome on y-axis, perfect-calibration diagonal
Predicted-versus-realised chart Single most useful visual for a performance-prediction model
Predicting Turnover
Risk scoring Each employee receives a probability of voluntary exit with confidence
Risk-and-criticality segmentation Employees grouped by risk and by role criticality
Driver analysis Identifies the factors that move the risk score with effect sizes
Action design by segment Specifies what action will be taken for each risk-and-criticality segment
Outcome tracking Compares predicted exits with realised exits cycle by cycle
Cycle-over-cycle calibration Calibration tracked over cycles is the strongest credibility signal
Action discipline in turnover prediction Action policy explicit and action history auditable for each segment
Visualising Both Together
Adverse-impact ratio panel Subgroup pass-through ratios surfaced with confidence intervals
Subgroup robustness panel Discrimination and calibration stratified by subgroup on the page
Action-by-segment panel Intervention policy named for each risk-and-criticality segment
Cycle-over-cycle calibration panel Predicted-versus-realised outcomes tracked across cycles
Operational Disciplines
Self-fulfilling forecast risk A prediction without a defensible action plan can produce its own outcome
Dashboard as the evidence file The dashboard is the evidence file for regulators, managers, and the board