flowchart LR A[Risk Scoring] --> B[Segmentation<br/>by risk and criticality] B --> C[Driver Analysis] C --> D[Action Design] D --> E[Outcome Tracking] E --> A style A fill:#E8F0FE,stroke:#1A73E8 style D fill:#FEF7E0,stroke:#F9AB00 style E fill:#E6F4EA,stroke:#137333
24 Selection Bias and Predicting Performance and Turnover
24.1 Why Bias and Prediction Belong Together
A selection model that predicts performance well but produces unfair outcomes will eventually be challenged; a model that produces fair outcomes but does not predict performance will not survive the next budget review.
The two questions in this chapter — selection bias and predicting performance and turnover — are usually treated as separate topics in HR-analytics curricula. They belong together because the same model is judged on both. The function that designs and operates a selection model is asked simultaneously whether the model produces fair outcomes across demographic and other protected groups, and whether it actually predicts the outcomes the firm cares about. The two answers cannot be sought sequentially; the model has to be evaluated on both in parallel, and the dashboard has to surface both.
The framing has become sharper as analytical models have moved from human-rated structured interviews to algorithm-driven scoring. As Robert E. Ployhart & Brian C. Holtz (2008) set out in their influential review of the diversity-validity dilemma, every selection method involves trade-offs between predictive validity and adverse impact, and the responsible function approaches those trade-offs deliberately rather than by accident. The newer machine-learning models do not abolish the dilemma; they often sharpen it, because their opacity makes adverse-impact patterns harder to detect and easier to dismiss as artefacts of the data. The discipline this chapter describes applies equally to a regression model, a structured-interview rubric, and a contemporary algorithmic scoring system.
The prediction side of the chapter rests on a long evidence base. As Peter W. Hom et al. (2017) documented in their century-spanning review of turnover research, the workforce-prediction questions a firm faces — who will perform well, who will leave, who will succeed in a different role — have accumulated more empirical attention than almost any other HR question, and the evidence supports specific methods rather than generalised intuition. The discipline is to treat prediction as an analytical problem with its own quality criteria, not as a magical capability the model brings.
The visualisation lens carries both halves. A bias chart is a subgroup comparison with the comparison group, sample size, and statistical significance rendered visibly. A prediction chart is a model-output visualisation with calibration, lift, and confidence rendered visibly. The page that surfaces both for the same model is the page that earns the function the right to use the model in production.
- Every selection model is evaluated on predictive validity and on adverse-impact metrics in parallel, and both are surfaced on the dashboard for every cycle.
- The dashboard renders the model’s calibration — predicted versus realised outcomes — alongside its discrimination — how well it separates higher and lower performers — so the audience reads prediction quality at a glance.
- Bias measurements use established statistical and legal definitions, with sample sizes and confidence intervals visible. A subgroup comparison without those companions is reporting, not evaluation.
24.2 Selection Bias: Concepts and Measurement
Selection bias has multiple working definitions, and a credible evaluation programme distinguishes them. Three definitions recur most often: adverse impact, predictive bias, and procedural fairness. Each has its own measurement, its own threshold, and its own remediation pattern.
| Definition | What it asks | Measurement | Typical threshold |
|---|---|---|---|
| Adverse impact | Are pass-through rates equal across protected groups | Selection-ratio comparison or four-fifths rule | Selection ratio below 0.80 raises a flag |
| Predictive bias | Does the model predict outcomes equally well across groups | Subgroup regression slopes and intercepts | Significant slope or intercept differences raise a flag |
| Procedural fairness | Do candidates perceive the process as fair | Candidate-experience surveys, due-process measures | Comparable scores across groups |
Most predictive selection methods produce some adverse impact across demographic groups, and reducing the impact often reduces the predictive validity. As Robert E. Ployhart & Brian C. Holtz (2008) set out, the responsible function does not pretend the trade-off does not exist. It surfaces the trade-off explicitly and chooses combinations that are defensible across both dimensions. The dashboard is where that choice is rendered. A combination that scores well on predictive validity but produces an adverse-impact ratio of 0.6 has to be redesigned; a combination that achieves perfect parity but predicts no better than chance has not been designed at all.
Several documented strategies reduce adverse impact while preserving predictive validity: using validated work-sample tests in place of untimed cognitive tests, structuring interviews tightly, weighting components based on empirical incremental validity, and applying score banding within statistically equivalent ranges. Each strategy has boundary conditions and is supported by evidence. The dashboard names the strategies in use, the impact each has had on the adverse-impact ratio, and the impact each has had on predictive validity, so that the audience can see the trade-off being managed rather than asserted.
24.3 Predicting Performance
Predicting performance is the headline use of selection models, and it has its own quality criteria. A model that ranks candidates correctly is a model with high discrimination. A model whose predicted scores match realised outcomes on average is a model with high calibration. The two are different, and a credible evaluation surfaces both.
| Criterion | What it asks | Visualisation |
|---|---|---|
| Discrimination | Does the model rank candidates correctly | ROC curve, lift chart, top-decile precision |
| Calibration | Do predicted scores match realised outcomes | Calibration plot, predicted-versus-realised chart |
| Stability | Does the model perform similarly across cycles | Longitudinal validation chart |
| Subgroup robustness | Does the model perform similarly across groups | Subgroup discrimination and calibration panels |
| Decision utility | Does the model improve hiring outcomes net of cost | Utility chart with realised gain |
The single most useful visual for a performance-prediction model is the calibration plot: predicted score on the x-axis, realised outcome on the y-axis, with the perfect-calibration line drawn as a reference. A model whose points cluster around the diagonal is a model that says what it means. A model whose points systematically deviate is a model whose predictions need to be re-scaled or whose scope needs to be restricted. The dashboard renders the calibration plot alongside discrimination metrics so the audience reads both at once.
24.4 Predicting Turnover
Turnover prediction is one of the most-requested HR-analytics deliverables and one of the most-misused. The challenge is not the model. It is the action that follows the model. A function that predicts attrition without a defensible plan for what to do with the prediction risks producing self-fulfilling forecasts, surveillance creep, and decisions that the workforce experiences as unfair.
| Stage | What it does | Visualisation |
|---|---|---|
| Risk scoring | Assigns each employee a probability of voluntary exit | Risk distribution chart with confidence band |
| Segmentation | Groups employees by risk and by criticality | Heat map of risk versus role criticality |
| Driver analysis | Identifies the factors that move the risk score | Driver chart with effect sizes |
| Action design | Specifies what action will be taken for each segment | Action-by-segment panel |
| Outcome tracking | Compares predicted exits with realised exits | Calibration panel and back-test |
The pipeline closes the loop when outcome tracking feeds the next cycle’s risk scoring. As Peter W. Hom et al. (2017) emphasise across their century-spanning review of turnover research, the most credible programmes are those that learn from each cycle’s calibration and update their scoring accordingly, rather than treating the model as fixed and the outcome as a verdict on the model.
The hardest part of turnover prediction is the action. A high-risk score for a high-criticality role implies one kind of intervention; a high-risk score for a low-criticality role implies a different one — sometimes none at all. The dashboard’s role is to make the action policy explicit and the action history auditable: what segment received what intervention, and what happened next. Without this discipline, turnover prediction becomes a list of names that the firm does not know what to do with.
24.5 Visualising Bias and Prediction Together
A credible selection-and-prediction dashboard surfaces bias and prediction on the same surface so that the function can defend its model on both dimensions in the same conversation. Five design choices, applied consistently, hold the two together.
| Choice | What it does on the page |
|---|---|
| Adverse-impact ratio panel | Subgroup pass-through ratios are surfaced with confidence intervals |
| Calibration plot | Predicted versus realised outcomes are shown for every model |
| Subgroup robustness panel | Discrimination and calibration are stratified by subgroup |
| Action-by-segment panel | The intervention policy is named for each risk-and-criticality segment |
| Cycle-over-cycle calibration | Predicted-versus-realised outcomes are tracked across cycles |
The bias-and-prediction dashboard is, in operation, the evidence file for the selection programme. When a regulator asks whether the firm’s selection or attrition-prediction model has been evaluated for fairness, the dashboard answers. When a hiring manager asks whether the model’s recommendations have produced better hires, the dashboard answers. When a board member asks whether the function is keeping up with the standards the firm has committed to, the dashboard answers. Build the dashboard for the audit and it serves the daily work; build it for the daily work alone and it will not survive the audit.
24.6 Hands-On Exercise: Adverse Impact and Turnover Prediction
Aim. Compute adverse-impact ratios for a workforce decision and build a turnover-prediction model with calibration and subgroup-robustness panels, surfacing both on a single Power BI page.
Scenario. You are running the bias-and-prediction analytics for an organisation that wants the same model judged on both fairness and predictive quality. The page has to defend the model in front of a regulator, a hiring manager, and a board member, each opening it for a different question.
Dataset. The IBM HR Analytics Employee Attrition dataset, available publicly on Kaggle at www.kaggle.com/datasets/pavansubhasht/ibm-hr-analytics-attrition-dataset. Use Attrition as the target, with Age, Department, JobLevel, JobSatisfaction, OverTime, MonthlyIncome, YearsAtCompany, and Gender as predictors and subgroup variables.
Deliverable. A Bias-and-Prediction.xlsx workbook with the adverse-impact and prediction calculations, plus a Bias-and-Prediction.pbix Power BI file with the dashboard described below.
24.6.1 Step 1 — Compute adverse-impact ratios
Treat Attrition = "Yes" as the negative outcome the model is being evaluated against. Compute the ratio of attrition rates across Gender.
Code
Excel Formula
Attrition Rate (Female) = COUNTIFS(HR[Gender], "Female", HR[Attrition], "Yes")
/ COUNTIF(HR[Gender], "Female")
Attrition Rate (Male) = COUNTIFS(HR[Gender], "Male", HR[Attrition], "Yes")
/ COUNTIF(HR[Gender], "Male")
Adverse Impact Ratio = MIN(Female_Rate, Male_Rate) / MAX(Female_Rate, Male_Rate)Apply the four-fifths rule: a ratio below 0.80 raises a flag. Repeat the calculation for Department and JobLevel.
24.6.2 Step 2 — Build a logistic regression for attrition risk
Use Excel’s Solver or the LOGEST function as a workshop substitute for full logistic regression. For a teaching-grade lab, fit a linear-probability model with the Data Analysis ToolPak’s Regression function.
Code
Excel Formula
=LINEST(HR[AttritionBinary], HR[[Age]:[OverTimeBinary]], TRUE, TRUE)The regression returns coefficients for each predictor. Compute predicted attrition probability for each employee.
Code
Excel Formula
Predicted Risk = INTERCEPT + SUMPRODUCT(Coefficients, EmployeePredictors)24.6.3 Step 3 — Build the calibration plot
Bin predicted risk into deciles. For each decile, compute the average predicted risk and the actual attrition rate. Plot predicted (x-axis) against realised (y-axis) and add the perfect-calibration diagonal as a reference.
24.6.4 Step 4 — Build the ROC curve
Sort employees by predicted risk descending. For each threshold, compute the true-positive rate and false-positive rate.
Code
Excel Formula
TPR (at threshold) = COUNTIFS(Predicted, ">=t", Actual, "Yes") / COUNTIF(Actual, "Yes")
FPR (at threshold) = COUNTIFS(Predicted, ">=t", Actual, "No") / COUNTIF(Actual, "No")Render the ROC curve and compute AUC using the trapezoidal rule.
24.6.5 Step 5 — Compute subgroup discrimination and calibration
Repeat Steps 3 and 4 separately for the Female and Male subgroups. Compute AUC for each subgroup. A material gap (more than 0.05 in AUC) is a fairness flag.
24.6.6 Step 6 — Build the risk-and-criticality segmentation
Add a Criticality lookup based on JobLevel (treat levels 4 and 5 as critical). Create a heat map of risk versus criticality with the firm’s intervention policy named for each cell:
- High risk and high criticality: targeted retention plan, escalation.
- High risk and low criticality: light-touch action.
- Low risk and high criticality: succession monitoring.
- Low risk and low criticality: no action.
24.6.7 Step 7 — Promote to Power BI and build the bias-and-prediction page
Load the data into Power BI. Build the adverse-impact panel with subgroup ratios, the calibration plot with the diagonal reference, the ROC curve with AUC labelled, the subgroup robustness panel, the risk-by-criticality heat map, and the action-by-segment policy table.
24.6.8 Step 8 — Add the cycle-over-cycle calibration placeholder
Reserve a panel that pre-populates the current cycle’s predicted attrition rate and the realised rate when the next cycle’s data arrives. Wire the page so the panel grows across future cycles into the longest-running visual on the dashboard.
24.6.9 Step 9 — Publish
Publish the report and tag it as the bias-and-prediction evidence file. Confirm that the page is opened in every selection-programme and retention-programme review.
This page extends the recruitment funnel of Chapter 22 and the validity dashboard of Chapter 23 into the joint bias-and-prediction view. The page also feeds the optimisation calculations of Chapter 28, where the predicted-risk distribution informs differential retention investment.
Bias-and-Prediction.xlsx, Bias-and-Prediction.pbix, and ch24-bias-and-prediction-walkthrough.mp4 will be attached at this point in the published edition. The screen recording walks through Steps 1 to 9 with the Excel adverse-impact and regression workbench and the Power BI bias-and-prediction page shown side by side.
Summary
| Concept | Description |
|---|---|
| Why Bias and Prediction Belong Together | |
| Bias and prediction belong together | The same selection model is judged on bias and on prediction; both belong on the dashboard |
| Trade-offs surfaced explicitly | Responsible function surfaces the predictive-validity-versus-adverse-impact trade-off |
| Algorithmic models do not abolish bias | Algorithmic models often sharpen the bias dilemma rather than abolish it |
| Action discipline as the hard part | The hardest part of turnover prediction is what the firm does with the prediction |
| Calibration alongside discrimination | A model with high discrimination but poor calibration is not yet usable |
| Selection Bias | |
| Adverse impact | Pass-through rates compared across protected groups, often with the four-fifths rule |
| Predictive bias | Whether the model predicts outcomes equally well across groups, by slope and intercept |
| Procedural fairness | Whether candidates perceive the process as fair across groups |
| Diversity-validity dilemma | Most predictive methods produce some adverse impact; reducing it can reduce validity |
| Strategies for reducing adverse impact | Work samples, structured interviews, weighting, and banding can reduce impact while preserving validity |
| Predicting Performance | |
| Discrimination criterion | Does the model rank candidates correctly, captured by ROC and lift |
| Calibration criterion | Do predicted scores match realised outcomes, captured by calibration plot |
| Stability criterion | Does the model perform similarly across cycles |
| Subgroup robustness criterion | Does the model perform similarly across protected groups |
| Decision-utility criterion | Does the model improve hiring outcomes net of cost |
| Calibration plot | Predicted score on x-axis, realised outcome on y-axis, perfect-calibration diagonal |
| Predicted-versus-realised chart | Single most useful visual for a performance-prediction model |
| Predicting Turnover | |
| Risk scoring | Each employee receives a probability of voluntary exit with confidence |
| Risk-and-criticality segmentation | Employees grouped by risk and by role criticality |
| Driver analysis | Identifies the factors that move the risk score with effect sizes |
| Action design by segment | Specifies what action will be taken for each risk-and-criticality segment |
| Outcome tracking | Compares predicted exits with realised exits cycle by cycle |
| Cycle-over-cycle calibration | Calibration tracked over cycles is the strongest credibility signal |
| Action discipline in turnover prediction | Action policy explicit and action history auditable for each segment |
| Visualising Both Together | |
| Adverse-impact ratio panel | Subgroup pass-through ratios surfaced with confidence intervals |
| Subgroup robustness panel | Discrimination and calibration stratified by subgroup on the page |
| Action-by-segment panel | Intervention policy named for each risk-and-criticality segment |
| Cycle-over-cycle calibration panel | Predicted-versus-realised outcomes tracked across cycles |
| Operational Disciplines | |
| Self-fulfilling forecast risk | A prediction without a defensible action plan can produce its own outcome |
| Dashboard as the evidence file | The dashboard is the evidence file for regulators, managers, and the board |