
PyTorch for First‑Timers: Tensors, Autograd & a Tiny Neural Net
Prerequisites
• You know basic Python (functions, classes,for
loops).
• You’ve met NumPy arrays (see our previous post).
That’s enough—you do not need prior deep‑learning experience.
1 Why PyTorch?
PyTorch is an open‑source library for tensor computation (like NumPy) and automatic differentiation (tracking gradients). Together they power modern neural‑network research and production.
- Pythonic & eager. Operations run immediately—no session graphs to compile.
- GPU acceleration. One line moves data from CPU to GPU for massive speed‑ups.
- Extensible. From simple linear models to cutting‑edge transformers.
Install with:
# CPU‑only (quickest to set up)
pip install torch torchvision torchaudio
# or choose a CUDA version at https://pytorch.org/get-started/locally/
2 Meet tensors: PyTorch’s core data structure
import torch
# Scalar (0‑D)
a = torch.tensor(3.14)
# 1‑D vector
v = torch.tensor([1, 2, 3])
# 2‑D matrix
M = torch.tensor([[1., 2.], [3., 4.]])
print(M.shape) # torch.Size([2, 2])
print(M.dtype) # torch.float32 by default
Key differences from NumPy:
NumPy ndarray | PyTorch tensor | |
---|---|---|
Gradients | Not built‑in | requires_grad=True |
GPU support | With CuPy or manual work | Native (.to('cuda') ) |
In‑place ops track? | n/a | Must end with _ (e.g. add_ ) |
Creating tensors quickly
torch.zeros((2, 3))
torch.ones(4)
torch.arange(0, 10, 2)
Tip:
torch.rand
andtorch.randn
give uniform and normal random tensors.
3 Devices: CPU vs GPU in one line
device = 'cuda' if torch.cuda.is_available() else 'cpu'
M = M.to(device) # move tensor to GPU if you have one
All math runs where the data lives—keep model and data on the same device.
4 Autograd: automatic differentiation by example
x = torch.tensor(3.0, requires_grad=True)
y = x**2 + 2*x + 1 # simple function
y.backward() # dy/dx computed automatically
print(x.grad) # ➜ tensor(8.)
Set requires_grad=True
and PyTorch builds a computational graph. Calling .backward()
walks that graph and fills .grad
for each leaf tensor.
Why care? Neural nets learn by gradient descent; autograd computes those gradients for you.
5 nn.Module: building blocks for neural networks
A PyTorch model is a subclass of torch.nn.Module
that defines:
- Layers in
__init__
- Forward pass in
forward()
import torch.nn as nn
class SimpleNet(nn.Module):
def __init__(self, in_features, hidden, out_features):
super().__init__()
self.fc1 = nn.Linear(in_features, hidden)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden, out_features)
def forward(self, x):
x = self.relu(self.fc1(x))
return self.fc2(x)
Create a model:
model = SimpleNet(in_features=2, hidden=8, out_features=1)
6 Training loop step‑by‑step (toy regression)
# Fake dataset: y = 3x₁ + 2x₂ + noise
n = 256
X = torch.randn(n, 2)
true_w = torch.tensor([[3.0], [2.0]])
y = X @ true_w + 0.1*torch.randn(n, 1)
model = SimpleNet(2, 8, 1)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
for epoch in range(200):
y_pred = model(X)
loss = criterion(y_pred, y)
optimizer.zero_grad() # reset gradients
loss.backward() # compute gradients
optimizer.step() # update params
if epoch % 40 == 0:
print(f"Epoch {epoch:3d} → loss {loss.item():.4f}")
Output (trimmed):
Epoch 0 → loss 4.1325
Epoch 40 → loss 0.0123
Epoch 80 → loss 0.0051
...
The model quickly recovers weights ~[3, 2]. That’s the entire deep‑learning pipeline on one screen.
7 Saving & loading models
# Save
torch.save(model.state_dict(), 'simplenet.pt')
# Load
loaded = SimpleNet(2, 8, 1)
loaded.load_state_dict(torch.load('simplenet.pt'))
loaded.eval() # switch to inference mode
.state_dict()
is a Python dict of tensors—small, portable, and language‑agnostic.
8 PyTorch cheat sheet 📄
Action | Command |
---|---|
Import | import torch as t |
Tensor from list | t.tensor([1,2,3]) |
Zeros / ones | t.zeros(shape) , t.ones(shape) |
Random normal / uniform | t.randn(shape) , t.rand(shape) |
Change dtype / device | x.float() , x.to('cuda') |
Requires grad | x.requires_grad_(True) |
Backprop | loss.backward() |
Optimisers | t.optim.SGD(params, lr) , t.optim.Adam(...) |
Define layer | nn.Linear(in, out) , nn.ReLU() |
Build model | Subclass nn.Module and write forward() |
Zero grads | optimizer.zero_grad() |
Update params | optimizer.step() |
Save / load | t.save(model.state_dict(), 'm.pt') / model.load_state_dict(...) |
Print or bookmark—this covers 90 % of daily PyTorch commands.
9 Gotchas & pro tips
- Detach tensors before converting to NumPy:
x.detach().cpu().numpy()
. - Use
.no_grad()
at inference time to skip gradient tracking. - Remember
.train()
vs.eval()
: dropout & batch‑norm behave differently. - Avoid mixing devices: moves like
tensor.to(device)
andmodel.to(device)
together. - In‑place ops: end with
_
(e.g.add_
) and can break autograd if mis‑used.
10 What’s next?
- DataLoader & datasets – handle millions of images without RAM pain.
- Convolutional layers – build a tiny image classifier.
- PyTorch Lightning – organise research‑grade code in minutes.
Stay curious—your next step might be training your first image recogniser or language model. PyTorch scales with you. Happy tensoring!