
Prediction vs Inference: Asking the Right Question
Big idea: A model can forecast tomorrow’s stock price or explain why gym visits lower blood pressure—but rarely excels at both. Decide which answer matters first.
1 Why split hairs?
Imagine a friend asking, “Will it rain tomorrow?” versus “Why does it rain more in April?” The first wants a yes/no for a specific day. The second wants an explanation of patterns. Data science poses the same twin questions: prediction and inference.
Goal you have | Typical wording | Real-world stake |
---|---|---|
Prediction | “What happens next?” | Weather apps, stock traders, demand planners |
Inference | “Why/How does it happen?” | Medicine, public policy, fairness in lending |
Choosing wrongly is like packing for a beach when you’re headed to the mountains—uncomfortable at best, disastrous at worst.
2 What exactly is prediction?
Definition: Using patterns in past data to guess an outcome for new data points.
Focus: accuracy on unseen examples.
Accepts: black-box models if they score better.
Ignores: whether humans can parse the decision path.
Everyday examples
- A courier company forecasting parcel volume for next Monday.
- Netflix suggesting the next series you’ll binge.
- Credit-card network flagging a transaction as fraud in milliseconds.
Key metric: Mean-squared error, accuracy, AUC, log-loss—anything that rewards correct guesses.
3 What exactly is inference?
Definition: Discovering which predictors drive the outcome and quantifying their impact.
Focus: interpretability and statistical validity.
Accepts: slightly lower raw accuracy if it gains clarity.
Demands: estimates you can trust and explain (“holding other variables constant…”).
Everyday examples
- Epidemiologist measuring how smoking affects lung-cancer risk.
- Economist studying the wage gap across industries.
- HR team identifying which skills predict on-the-job success.
Key tools: Confidence intervals, p-values, effect sizes, causal diagrams.
4 Same data, different lens
Consider a dataset of 10,000 used-car listings—make, mileage, price.
Task version | Model you might choose | Why that fits the goal |
---|---|---|
Predict the selling price | Gradient-boosted trees | Top leaderboard accuracy, ignores why |
Explain price drivers | Multiple linear regression | Clear coefficients for mileage, brand |
Same dataset, two roads. Pick the signpost before driving.
5 How the choice shapes your workflow
Stage | If you need prediction | If you need inference |
---|---|---|
Collect data | Grab every signal you can; more ≈ better | Prioritise clean, unbiased, domain-relevant variables |
Pre-process | Aggressive feature engineering, one-hot everything | Keep transformations interpretable |
Choose model | Any algorithm that boosts accuracy (ensembles, DL) | Simple, transparent forms (linear, GLM, causal trees) |
Evaluate | Cross-validated error on hold-out set | Significance, confidence intervals, domain plausibility |
Deploy/Report | Monitor drift, retrain often | Communicate effect sizes, policy implications |
6 Case study: loan approval
Scenario: A bank wants to automate loan decisions while staying compliant with fairness regulations.
- Prediction need: minimise default risk ⇒ high-accuracy model (e.g. XGBoost).
- Inference need: prove decisions aren’t biased ⇒ interpretable model or post-hoc explanation.
- Compromise: Use a powerful predictor plus SHAP values or train a simpler model within a narrow confidence band.
Outcome: slight dip in raw accuracy but massive gain in regulatory approval and customer trust.
7 Common pitfalls & how to dodge them
- Pitfall: Using a deep neural net for a medical study, then over-interpreting hidden weights.
Fix: pick a model designed for inference (e.g. logistic regression) or apply causal inference techniques. - Pitfall: Forcing a linear model to predict cryptocurrency prices.
Fix: if sheer accuracy is king, accept the black box and track performance over time.
8 A quick checklist
- Stakeholders: Who will use the result, and how?
- Tolerate opacity? If “no,” lean inference.
- Cost of being wrong? If high, favour best-in-class prediction.
- Need causality or policy insight? Inference all the way.
Tape this list above your monitor—it saves weeks of back-tracking.
9 Where we’re heading
Next in the series | What you’ll discover |
---|---|
Estimating | How loss functions and optimisation teach a model |
Flexibility vs Interpretability | Techniques to explain complex models |
Key takeaways
- Prediction and inference answer different questions—pick one.
- Your choice determines data prep, model type, and success metric.
- Trying to maximise both accuracy and interpretability usually forces compromise—be explicit about which trade-off matters.
- Ask stakeholders early; save headaches later.
Next stop: Estimating —the maths and intuition behind fitting a model in the first place.