PyTorch for First‑Timers: Tensors, Autograd & a Tiny Neural Net

•5/10/2025• 4 min read

pytorch deep-learning python beginners cheat-sheet

Prerequisites
• You know basic Python (functions, classes, for loops).
• You’ve met NumPy arrays (see our previous post).
That’s enough—you do not need prior deep‑learning experience.

1 Why PyTorch?

PyTorch is an open‑source library for tensor computation (like NumPy) and automatic differentiation (tracking gradients). Together they power modern neural‑network research and production.

Pythonic & eager. Operations run immediately—no session graphs to compile.
GPU acceleration. One line moves data from CPU to GPU for massive speed‑ups.
Extensible. From simple linear models to cutting‑edge transformers.

Install with:

# CPU‑only (quickest to set up)
pip install torch torchvision torchaudio
# or choose a CUDA version at https://pytorch.org/get-started/locally/

2 Meet tensors: PyTorch’s core data structure

import torch

# Scalar (0‑D)
a = torch.tensor(3.14)
# 1‑D vector
v = torch.tensor([1, 2, 3])
# 2‑D matrix
M = torch.tensor([[1., 2.], [3., 4.]])

print(M.shape)   # torch.Size([2, 2])
print(M.dtype)   # torch.float32 by default

Key differences from NumPy:

	NumPy ndarray	PyTorch tensor
Gradients	Not built‑in	`requires_grad=True`
GPU support	With CuPy or manual work	Native (`.to('cuda')`)
In‑place ops track?	n/a	Must end with `_` (e.g. `add_`)

Creating tensors quickly

torch.zeros((2, 3))
torch.ones(4)
torch.arange(0, 10, 2)

Tip: torch.rand and torch.randn give uniform and normal random tensors.

3 Devices: CPU vs GPU in one line

device = 'cuda' if torch.cuda.is_available() else 'cpu'
M = M.to(device)  # move tensor to GPU if you have one

All math runs where the data lives—keep model and data on the same device.

4 Autograd: automatic differentiation by example

x = torch.tensor(3.0, requires_grad=True)
y = x**2 + 2*x + 1        # simple function

y.backward()              # dy/dx computed automatically
print(x.grad)             # ➜ tensor(8.)

Set requires_grad=True and PyTorch builds a computational graph. Calling .backward() walks that graph and fills .grad for each leaf tensor.

Why care? Neural nets learn by gradient descent; autograd computes those gradients for you.

5 nn.Module: building blocks for neural networks

A PyTorch model is a subclass of torch.nn.Module that defines:

Layers in __init__
Forward pass in forward()

import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self, in_features, hidden, out_features):
        super().__init__()
        self.fc1 = nn.Linear(in_features, hidden)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden, out_features)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        return self.fc2(x)

Create a model:

model = SimpleNet(in_features=2, hidden=8, out_features=1)

6 Training loop step‑by‑step (toy regression)

# Fake dataset: y = 3x₁ + 2x₂ + noise
n = 256
X = torch.randn(n, 2)
true_w = torch.tensor([[3.0], [2.0]])
y = X @ true_w + 0.1*torch.randn(n, 1)

model = SimpleNet(2, 8, 1)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

for epoch in range(200):
    y_pred = model(X)
    loss = criterion(y_pred, y)

    optimizer.zero_grad()  # reset gradients
    loss.backward()        # compute gradients
    optimizer.step()       # update params

    if epoch % 40 == 0:
        print(f"Epoch {epoch:3d} → loss {loss.item():.4f}")

Output (trimmed):

Epoch   0 → loss 4.1325
Epoch  40 → loss 0.0123
Epoch  80 → loss 0.0051
...

The model quickly recovers weights ~[3, 2]. That’s the entire deep‑learning pipeline on one screen.

7 Saving & loading models

# Save
torch.save(model.state_dict(), 'simplenet.pt')

# Load
loaded = SimpleNet(2, 8, 1)
loaded.load_state_dict(torch.load('simplenet.pt'))
loaded.eval()  # switch to inference mode

.state_dict() is a Python dict of tensors—small, portable, and language‑agnostic.

8 PyTorch cheat sheet 📄

Action	Command
Import	`import torch as t`
Tensor from list	`t.tensor([1,2,3])`
Zeros / ones	`t.zeros(shape)`, `t.ones(shape)`
Random normal / uniform	`t.randn(shape)`, `t.rand(shape)`
Change dtype / device	`x.float()`, `x.to('cuda')`
Requires grad	`x.requires_grad_(True)`
Backprop	`loss.backward()`
Optimisers	`t.optim.SGD(params, lr)`, `t.optim.Adam(...)`
Define layer	`nn.Linear(in, out)`, `nn.ReLU()`
Build model	Subclass `nn.Module` and write `forward()`
Zero grads	`optimizer.zero_grad()`
Update params	`optimizer.step()`
Save / load	`t.save(model.state_dict(), 'm.pt')` / `model.load_state_dict(...)`

Print or bookmark—this covers 90 % of daily PyTorch commands.

9 Gotchas & pro tips

Detach tensors before converting to NumPy: x.detach().cpu().numpy().
Use .no_grad() at inference time to skip gradient tracking.
Remember .train() vs .eval(): dropout & batch‑norm behave differently.
Avoid mixing devices: moves like tensor.to(device) and model.to(device) together.
In‑place ops: end with _ (e.g. add_) and can break autograd if mis‑used.

10 What’s next?

DataLoader & datasets – handle millions of images without RAM pain.
Convolutional layers – build a tiny image classifier.
PyTorch Lightning – organise research‑grade code in minutes.

Stay curious—your next step might be training your first image recogniser or language model. PyTorch scales with you. Happy tensoring!