PyTorch Neural Networks#

PyTorch is a Python package for defining and training neural networks. Neural networks and deep learning have been a hot topic for several years, and are the tools underlying many state-of-the art machine learning tasks. There are many industrial applications (e.g. at your favorite or least favorite companies in Silicon Valley), but also many scientific applications including

  • Processing data in particle detectors

  • Seismic imaging / medical imaging

  • Accelerating simulations of physical phenomena

A (deep) feed-forward neural network is the composition of functions \begin{equation} f_N(x; w_N, b_N) \circ f_{N-1}(x; w_{N-1}, b_{N-1}) \circ \dots f_0(x; w_0, b_0) \end{equation} where each \(f_i(x; w_i, b_i)\) is a (non-linear) function with learnable parameters \(w_i, b_i\). There are many choices for what the exact function is. A common and simple one to describe is an (affine) linear transformation followed by a non-linearity. \begin{equation} f_i(x; w_i, b_i) = (w_i \cdot x + b_i)_+ \end{equation}

where \(w_i \cdot x\) is matrix-vector multiplication, and \((\cdot)_+\) is the ReLU operation (Rectified Linear Unit) \begin{equation} x_+ = \begin{cases} x & x > 0\ 0 & x \le 0 \end{cases} \end{equation}

If you take the composition of several functions like this, you have a multilayer perceptron (MLP).

Deep Learning Libraries#

There are many deep learning libraries available, the most common ones for python are

  • TensorFlow, Keras

  • PyTorch

Working with tensorflow requires going into lot of details of the contruction of the computation graph, whereas Keras is a higher level interface for tensorflow. Tensorflow is very popular in the industry and good for production code.

PyTorch can be used as low level interface, but is much more user-friendly than tensorflow, but it also has a higher level interface. Pytorch is more popular in the research community.

Main features that any deep learning library should provide#

No matter what library or language you use, the main features provided by a deep learning library are

  1. Use the GPU to speed up computation

  2. Ability to do automatic differentiation

  3. Useful library functions for common architectures and optimization algorithms


We will look at all of the above in pytorch. The best way to think about pytorch is that its numpy + GPU + autograd.

You can install it with

conda install pytorch.

Alternatively (and recommended), run this notebook in Google Colab– it provides an environment with all of the PyTorch dependencies plus a GPU free of charge.

import torch
import numpy as np
import matplotlib.pyplot as plt
Automatic Differentiation#

Automatic differentiation is different from numerical differentiation, which requires a choice of step size, and symbolic differentiation which creates a single expression for a derivative. Instead it performs chain rule repeatedly.

PyTorch uses dynamic computation graphs to compute the gradients of the parameters.

x = torch.tensor([2.0])
m = torch.tensor([5.0], requires_grad = True)
c = torch.tensor([2.0], requires_grad = True)
y = m*x + c
tensor([12.], grad_fn=<AddBackward0>)

Define an error for your function

loss = torch.norm( y - 13)
tensor(1., grad_fn=<CopyBackwards>)

Calling x.backward() on any tensor forces pytorch to compute all the gradients of the tensors used to compute x which had the requires_grad flag set to True. The computed gradient will be stored in the .grad property of the tensors

with torch.no_grad():
    m -= 0.01 * m.grad
    c -= 0.3 * c.grad
(tensor([5.0200], requires_grad=True), tensor([2.3000], requires_grad=True))
m.grad, c.grad
(tensor([-2.]), tensor([-1.]))

m.grad, c.grad
(tensor([0.]), tensor([0.]))
y = m*x + c
tensor([12.3400], grad_fn=<AddBackward0>)
loss = torch.norm( y - 13)
tensor(0.6600, grad_fn=<CopyBackwards>)
m.grad, c.grad
(tensor([-2.]), tensor([-1.]))

Making it more compact#

def model_fn(x,m,c):
    return m*x + c

def loss_fn(y,yt):
    return torch.norm(y-yt)
m = torch.tensor([5.0], requires_grad = True)
c = torch.tensor([2.0], requires_grad = True)
x = torch.tensor([2.0])
yt = torch.tensor([13.0])
y = model_fn(x,m,c)
loss = loss_fn(y,yt)
with torch.no_grad():
    m -= 0.05 * m.grad
    c -= 0.05 * c.grad

print( f" m = {m}\n c = {c}\n y = {y}\n loss = {loss}")
#note that 'loss' indicates the loss for the previous m,c values
 m = tensor([5.1000], requires_grad=True)
 c = tensor([2.0500], requires_grad=True)
 y = tensor([12.], grad_fn=<AddBackward0>)
 loss = 1.0

Here’s an explicit loop:

x = torch.randn(5,100)
yt = torch.randn(1,100)
losses = []

for i in range(100):
    y = model_fn(x,m,c)
    loss = loss_fn(y,yt)
    with torch.no_grad():
        m -= 0.05 * m.grad
        c -= 0.05 * c.grad

    print( f"loss = {loss}")

Using Library functions#

The subpackage torch.nn provides an object-oriented library of functions that can be composed together.

model = torch.nn.Sequential(
    torch.nn.Linear(5, 5), # 5 x 5 matrix
    torch.nn.ReLU(),       # ReLU nonlinearity
    torch.nn.Linear(5, 5), # 5 x 5 matrix
[Parameter containing:
 tensor([[ 0.3316,  0.1522,  0.2492, -0.2992,  0.4102],
         [-0.3598,  0.1222, -0.3902, -0.2934, -0.3457],
         [-0.2198,  0.2152,  0.3994, -0.3181, -0.0516],
         [-0.2023,  0.0867,  0.3717,  0.1664, -0.2102],
         [ 0.4042, -0.3993, -0.3191, -0.2141, -0.2772]], requires_grad=True),
 Parameter containing:
 tensor([-0.4013,  0.2950,  0.1151, -0.2628,  0.2941], requires_grad=True),
 Parameter containing:
 tensor([[-0.3463,  0.3572,  0.3492, -0.3008, -0.2829],
         [ 0.2310, -0.4222,  0.3614, -0.2791, -0.0441],
         [ 0.3478,  0.1490, -0.2911,  0.3047, -0.2649],
         [-0.1012,  0.0829, -0.4061,  0.2447,  0.3126],
         [ 0.2993,  0.0131,  0.1135, -0.3588,  0.2828]], requires_grad=True),
 Parameter containing:
 tensor([-0.2317, -0.3994,  0.1748,  0.2988, -0.1613], requires_grad=True)]
loss_fn = torch.nn.MSELoss(reduction='sum')

In this case, we’ll just fit the model to random data.

x = torch.randn(100,5)
yt = torch.randn(100,5)
losses = []

Optimizers in torch.optim implement a variety of optimization strategies. Almost all are based on gradient descent, since forming Hessians is prohibitive.

optimizer = torch.optim.Adam(model.parameters(), lr=0.03)
for i in range(100):
    y = model(x)
    loss = loss_fn(y,yt)


    print( f"loss = {loss}")
MNIST Example#

First, you’ll want to install the torchvision package - this is a package for PyTorch that provides a variety of computer vision functionality.

The MNIST data set consists of a collection of handwritten digits (0-9). Our goal is to train a neural net which will classify the image of each digit as the correct digit

conda install torchvision -c pytorch
import torchvision
from torchvision.datasets import MNIST
data = MNIST(".",download=True)
import numpy as np
img,y = data[np.random.randint(1,60000)]
MNIST Training#

model = torch.nn.Sequential(
    torch.nn.Linear(784, 100),
    torch.nn.Linear(100, 100),
    torch.nn.Linear(100, 10),
loss_fn = torch.nn.CrossEntropyLoss()
sample = np.random.choice(range(len(data.train_data)),1000)
x = data.train_data[sample].reshape(1000,-1).float()/255
yt = data.train_labels[sample]
(torch.Size([1000, 784]), torch.Size([1000]))
optimizer = torch.optim.Adam(model.parameters(), lr=0.03)
losses = []
for i in range(100):
    sample = np.random.choice(range(len(data.train_data)),1000)
    x = data.train_data[sample].reshape(1000,-1).float()/255
    yt = data.train_labels[sample]
    y = model(x)
    loss = loss_fn(y,yt)


    #print( f"loss = {loss}")
x_test = data.train_data[-1000:].reshape(1000,-1).float()/255
y_test = data.train_labels[-1000:]
with torch.no_grad():
    y_pred = model(x_test)
print("Accuracy = ", (y_pred.argmax(dim=1) == y_test).sum().float().item()/1000.0)
Accuracy =  0.978


This notebook was adapted from a notebook from CME 193 at Stanford