How to use PyTorch

Preparation for use

PyTorch for GPU is installed in Genkai and can be used from node groups B and C.
The following modules need to be loaded when using PyTorch. (Note the version of the module to be loaded.)
cuda/12.2.2, cudnn/8.9.7, nccl/2.22.3, pytorch-cuda/2.3.1-12.2.2
The Python version must be 3.11 (python3.11 command).

Note that it is possible to run PyTorch using the CPU on node group A.
Please load gcc/8 and pytorch/2.3.1.
You need to use Python version 3.11 (python3.11 command) as well as the GPU version.

If you need additional Python modules, please use pip3.11 install --user to install and use the modules.
If you want to use a different version of PyTorch, please consider installing it yourself or using a container.

(If you load the cuda module, you will see the cudnn and pytorch-cuda modules in the result of module avail. Similarly, loading the gcc module will cause the pytorch module to appear in the module avail results).


Usage Example: Target Program

Let’s run the following sample program, which is publicly available as a sample of PyTorch.
This program can switch between CPU and GPU execution by rewriting the torch.device at the beginning, so let’s execute the one for GPU as pytorch_sample_gpu.py and the one for CPU as pytorch_sample_cpu.py. Let’s try it.

# -*- coding: utf-8 -*-

import torch
import math


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d


print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

Execution by GPU

The following is an example of batch job execution on node group B.

The following job script is used to execute the job.

$ cat job_pytorch_gpu.sh
#!/bin/bash
#PJM -L rscgrp=b-batch
#PJM -L gpu=1
#PJM -L elapse=10:00
#PJM -j
#PJM -S

module load cuda/12.2.2 cudnn/8.9.7 nccl/2.22.3 pytorch-cuda/2.3.1-12.2.2
python3.11 ./pytorch_sample_gpu.py

Execution by CPU

Although PyTorch is often executed using a GPU, we will also show an example of batch job execution using a CPU on node group A.

Prepare and run the following job script.

$ cat job_pytorch_cpu.sh
#!/bin/bash
#PJM -L rscgrp=a-batch
#PJM -L vnode-core=10
#PJM -L elapse=10:00
#PJM -j
#PJM -S

module load gcc/8 pytorch/2.3.1
python3.11 ./pytorch_sample_cpu.py


Execution results

An example of the execution result is shown below.
The print statement in the for loop prints out the progress, and the final print statement prints out the result.
Since the input data is generated using random numbers, the values vary greatly each time it is executed.

$ cat cat job_pytorch_gpu.sh.82610.out
99 3770.3193359375
199 2534.77490234375
299 1706.46923828125
399 1150.7528076171875
499 777.6245727539062
599 526.8876953125
699 358.253173828125
799 244.737548828125
899 168.25572204589844
999 116.67759704589844
1099 81.86070251464844
1199 58.3349609375
1299 42.4226188659668
1399 31.648521423339844
1499 24.345962524414062
1599 19.39096450805664
1699 16.025184631347656
1799 13.736391067504883
1899 12.178205490112305
1999 11.11623764038086
Result: y = -0.03617167845368385 + 0.8240451812744141 x + 0.006240217015147209 x^2 + -0.0886797159910202 x^3