Cover image

What Is Statistical Learning?

5 min read

“Statistical learning refers to a set of tools for understanding data.”
An Introduction to Statistical Learning (ISL), Ch. 1

When a spreadsheet starts looking like the Matrix and drawing a quick trend‑line in Excel no longer helps, we upgrade to statistical learning—a family of methods that learn patterns directly from data.


1 Why do we care?

The hunt for the true function ff

At its heart, statistical learning is a search mission. We assume that behind the messy world there exists an invisible rule—call it ff—that links the things we can measure (predictors) to the thing we care about (outcome).

Term you’ll hearAlso called…What it means in plain EnglishTiny example
PredictorFeature, input, independent variableA measurable signal we can feed the modelSquare‑meters of a house
OutcomeTarget, label, dependent variableThe value we want to predict or explainThe house’s sale price

Goal: build a model f^\widehat f that mimics the unknown ff. The closer f^\widehat f is to ff, the better we can predict or understand new data.

If one sentence must survive: Statistical learning is about turning historical examples into a rule that works on tomorrow’s examples.


2 The core idea, written gently

We formalise the story with the equation

Y=f(X)+ε,(1)Y = f(\mathbf X) + \varepsilon, \quad \text{(1)}

where:

Think of ff as the smooth road and ε\varepsilon as the unavoidable potholes. Our map f^\widehat f should follow the road as closely as possible without over‑reacting to each pothole.


3 Prediction vs Inference — two very different jobs

  1. Prediction cares only about how close our guesses are. Why the guess was made can be a black box.
  2. Inference wants the story: which predictor moves the outcome, and by how much?
Your questionYou need…Real‑life setting
“What will Bitcoin cost tomorrow?”PredictionTrading bot, short‑term risk management
“Which habits raise heart‑disease risk?”InferenceHealthcare policy, personalised medicine

Knowing which hat you’re wearing guides everything that follows—from algorithm choice to how you evaluate success.


4 How we learn: supervised, unsupervised, plus two cousins

ParadigmData you holdWhat the algorithm deliversBeginner‑friendly picture
SupervisedPairs (x,y)(\mathbf x, y) (inputs and correct answers)A rule f^\widehat f to map new x\mathbf xyyFlash‑cards with solutions
UnsupervisedOnly x\mathbf x (no answers)Hidden structure, natural groupingsSorting socks by colour
Semi‑supervisedA few answers + many unlabeledBetter f^\widehat f than either aloneHaving a partial answer key
ReinforcementStates, actions, rewardsA strategy that maximises future rewardTraining a dog with treats

For most first projects (house prices, spam detection) you’ll start in the supervised box.


5 Parametric vs Non‑Parametric — choosing how flexible f^\widehat f can be

ApproachQuick intuitionStrengthsWatch‑outs
ParametricAssume a simple formula (straight line, logistic curve); estimate a handful of numbersWorks with small data, easy to read off coefficientsMisses bends ⇒ high bias
Non‑ParametricLet data carve its own shape (decision trees, kk‑nearest neighbours, splines)Captures twists you never imaginedNeeds more data, can wiggle too much ⇒ high variance

Enter the Bias–Variance dance

Too rigid (left side) → under‑fit (high bias). Too bendy (right side) → over‑fit (high variance). Sweet‑spot in the valley.


6 How do we know we’ve done well?

Split the data. Reserve a chunk the model never sees—called the test set.

Pick a metric.

Lower is better. Always report the number from the test set—not the training set—so you’re graded on brand‑new homework.


7 Where we’re heading

Next in seriesWhat you’ll learn
Prediction vs InferencePicking the right goal for your project
Estimating ffLoss functions, gradient descent, cross‑validation
Flexibility vs InterpretabilityTricks to tame variance without losing meaning

Key takeaways

  1. Aim: approximate the hidden rule ff that links predictors to outcomes.
  2. Decide early—do you need accuracy or explanation?
  3. Labeled data → supervised learning; unlabeled → unsupervised.
  4. Every model trades bias against variance; data size and model complexity set the dial.
  5. Judge success on fresh data, not the data you trained on.

Ready to dive deeper? Next post digs into why sometimes “just give me the right number” is enough, and other times you need the full story behind it.

RELATED_POSTS