Prediction vs Inference: Asking the Right Question

•5/10/2025• 4 min read

prediction inference statistical-learning beginners

Big idea: A model can forecast tomorrow’s stock price or explain why gym visits lower blood pressure—but rarely excels at both. Decide which answer matters first.

1 Why split hairs?

Imagine a friend asking, “Will it rain tomorrow?” versus “Why does it rain more in April?” The first wants a yes/no for a specific day. The second wants an explanation of patterns. Data science poses the same twin questions: prediction and inference.

Goal you have	Typical wording	Real-world stake
Prediction	“What happens next?”	Weather apps, stock traders, demand planners
Inference	“Why/How does it happen?”	Medicine, public policy, fairness in lending

Choosing wrongly is like packing for a beach when you’re headed to the mountains—uncomfortable at best, disastrous at worst.

2 What exactly is prediction?

Definition: Using patterns in past data to guess an outcome for new data points.

Focus: accuracy on unseen examples.
Accepts: black-box models if they score better.
Ignores: whether humans can parse the decision path.

Everyday examples

A courier company forecasting parcel volume for next Monday.
Netflix suggesting the next series you’ll binge.
Credit-card network flagging a transaction as fraud in milliseconds.

Key metric: Mean-squared error, accuracy, AUC, log-loss—anything that rewards correct guesses.

3 What exactly is inference?

Definition: Discovering which predictors drive the outcome and quantifying their impact.

Focus: interpretability and statistical validity.
Accepts: slightly lower raw accuracy if it gains clarity.
Demands: estimates you can trust and explain (“holding other variables constant…”).

Everyday examples

Epidemiologist measuring how smoking affects lung-cancer risk.
Economist studying the wage gap across industries.
HR team identifying which skills predict on-the-job success.

Key tools: Confidence intervals, p-values, effect sizes, causal diagrams.

4 Same data, different lens

Consider a dataset of 10,000 used-car listings—make, mileage, price.

Task version	Model you might choose	Why that fits the goal
Predict the selling price	Gradient-boosted trees	Top leaderboard accuracy, ignores why
Explain price drivers	Multiple linear regression	Clear coefficients for mileage, brand

Two road signs—one pointing to “Fast Route”, the other to “Scenic Route”.

Same dataset, two roads. Pick the signpost before driving.

5 How the choice shapes your workflow

Stage	If you need prediction	If you need inference
Collect data	Grab every signal you can; more ≈ better	Prioritise clean, unbiased, domain-relevant variables
Pre-process	Aggressive feature engineering, one-hot everything	Keep transformations interpretable
Choose model	Any algorithm that boosts accuracy (ensembles, DL)	Simple, transparent forms (linear, GLM, causal trees)
Evaluate	Cross-validated error on hold-out set	Significance, confidence intervals, domain plausibility
Deploy/Report	Monitor drift, retrain often	Communicate effect sizes, policy implications

6 Case study: loan approval

Scenario: A bank wants to automate loan decisions while staying compliant with fairness regulations.

Prediction need: minimise default risk ⇒ high-accuracy model (e.g. XGBoost).
Inference need: prove decisions aren’t biased ⇒ interpretable model or post-hoc explanation.
Compromise: Use a powerful predictor plus SHAP values or train a simpler model within a narrow confidence band.

Outcome: slight dip in raw accuracy but massive gain in regulatory approval and customer trust.

7 Common pitfalls & how to dodge them

Pitfall: Using a deep neural net for a medical study, then over-interpreting hidden weights.
Fix: pick a model designed for inference (e.g. logistic regression) or apply causal inference techniques.
Pitfall: Forcing a linear model to predict cryptocurrency prices.
Fix: if sheer accuracy is king, accept the black box and track performance over time.

8 A quick checklist

Stakeholders: Who will use the result, and how?
Tolerate opacity? If “no,” lean inference.
Cost of being wrong? If high, favour best-in-class prediction.
Need causality or policy insight? Inference all the way.

Tape this list above your monitor—it saves weeks of back-tracking.

9 Where we’re heading

Next in the series	What you’ll discover
Estimating $f$	How loss functions and optimisation teach a model
Flexibility vs Interpretability	Techniques to explain complex models

Key takeaways

Prediction and inference answer different questions—pick one.
Your choice determines data prep, model type, and success metric.
Trying to maximise both accuracy and interpretability usually forces compromise—be explicit about which trade-off matters.
Ask stakeholders early; save headaches later.

Next stop: Estimating $f$ —the maths and intuition behind fitting a model in the first place.

Prediction vs Inference: Asking the Right Question

1 Why split hairs?

2 What exactly is prediction?

Everyday examples

3 What exactly is inference?

Everyday examples

4 Same data, different lens

5 How the choice shapes your workflow

6 Case study: loan approval

7 Common pitfalls & how to dodge them

8 A quick checklist

9 Where we’re heading

Key takeaways

Flexibility vs Interpretability: Finding Your Model’s Sweet Spot

How Do We Estimate f? Turning Data into a Working Rule

Supervised and Unsupervised Learning: With Answers and Without