Cover image

Prediction vs Inference: Asking the Right Question

4 min read

Big idea: A model can forecast tomorrow’s stock price or explain why gym visits lower blood pressure—but rarely excels at both. Decide which answer matters first.


1 Why split hairs?

Imagine a friend asking, “Will it rain tomorrow?” versus “Why does it rain more in April?” The first wants a yes/no for a specific day. The second wants an explanation of patterns. Data science poses the same twin questions: prediction and inference.

Goal you haveTypical wordingReal-world stake
Prediction“What happens next?”Weather apps, stock traders, demand planners
Inference“Why/How does it happen?”Medicine, public policy, fairness in lending

Choosing wrongly is like packing for a beach when you’re headed to the mountains—uncomfortable at best, disastrous at worst.


2 What exactly is prediction?

Definition: Using patterns in past data to guess an outcome for new data points.

Focus: accuracy on unseen examples.
Accepts: black-box models if they score better.
Ignores: whether humans can parse the decision path.

Everyday examples

Key metric: Mean-squared error, accuracy, AUC, log-loss—anything that rewards correct guesses.


3 What exactly is inference?

Definition: Discovering which predictors drive the outcome and quantifying their impact.

Focus: interpretability and statistical validity.
Accepts: slightly lower raw accuracy if it gains clarity.
Demands: estimates you can trust and explain (“holding other variables constant…”).

Everyday examples

Key tools: Confidence intervals, p-values, effect sizes, causal diagrams.


4 Same data, different lens

Consider a dataset of 10,000 used-car listings—make, mileage, price.

Task versionModel you might chooseWhy that fits the goal
Predict the selling priceGradient-boosted treesTop leaderboard accuracy, ignores why
Explain price driversMultiple linear regressionClear coefficients for mileage, brand

Two road signs—one pointing to “Fast Route”, the other to “Scenic Route”.

Same dataset, two roads. Pick the signpost before driving.


5 How the choice shapes your workflow

StageIf you need predictionIf you need inference
Collect dataGrab every signal you can; more ≈ betterPrioritise clean, unbiased, domain-relevant variables
Pre-processAggressive feature engineering, one-hot everythingKeep transformations interpretable
Choose modelAny algorithm that boosts accuracy (ensembles, DL)Simple, transparent forms (linear, GLM, causal trees)
EvaluateCross-validated error on hold-out setSignificance, confidence intervals, domain plausibility
Deploy/ReportMonitor drift, retrain oftenCommunicate effect sizes, policy implications

6 Case study: loan approval

Scenario: A bank wants to automate loan decisions while staying compliant with fairness regulations.

  1. Prediction need: minimise default risk ⇒ high-accuracy model (e.g. XGBoost).
  2. Inference need: prove decisions aren’t biased ⇒ interpretable model or post-hoc explanation.
  3. Compromise: Use a powerful predictor plus SHAP values or train a simpler model within a narrow confidence band.

Outcome: slight dip in raw accuracy but massive gain in regulatory approval and customer trust.


7 Common pitfalls & how to dodge them


8 A quick checklist

  1. Stakeholders: Who will use the result, and how?
  2. Tolerate opacity? If “no,” lean inference.
  3. Cost of being wrong? If high, favour best-in-class prediction.
  4. Need causality or policy insight? Inference all the way.

Tape this list above your monitor—it saves weeks of back-tracking.


9 Where we’re heading

Next in the seriesWhat you’ll discover
Estimating ffHow loss functions and optimisation teach a model
Flexibility vs InterpretabilityTechniques to explain complex models

Key takeaways

Next stop: Estimating ff—the maths and intuition behind fitting a model in the first place.

RELATED_POSTS