How to use TensorFlow
Preparation for use
TensorFlow supporting GPU is installed in Genkai and can be used from node groups B and C. The following modules need to be loaded when using it. (Note the version of the module to be loaded.)
cuda/12.2.2
, cudnn/8.9.7
, tensorflow-cuda/2.61.1-12.2.2
.
The Python version must be 3.11 (python3.11
command).
Note that it is possible to run TensorFlow using the CPU on node group A. (Also available on node group B and C.) Please load following modules.
gcc/8
, tensorflow/2.61.1
You need to use Python version 3.11 (python3.11
command) as well as the GPU version.
If you need additional Python modules, please use pip3.11 install --user
to install and use the modules.If you want to use a different version of TensorFlow, please consider installing it yourself or using a container.
(If you load the cuda module, you will see the cudnn and tensorflow-cuda modules in the result of module avail
. Similarly, loading the gcc module will cause the tensorflow module to appear in the module avail
results).
2024.10.09 update
TensorFlow with advanced support for AVX and AMX is now available by loading this module.
tensorflow-cpu/2.17.0
Usage example: target program
As an example of using TensorFlow, let’s run a series of basic usage processes introduced on the TensorFlow website
First, extract the necessary processing and save it as tensorflow.py
.
$ cat tensorflow.py
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
predictions = model(x_train[:1]).numpy()
predictions
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
probability_model = tf.keras.Sequential([
model,
tf.keras.layers.Softmax()
])
probability_model(x_test[:5])
|
Execution by GPU
The following is an example of batch job execution on node group B.
The following job script is used to execute the job.
$ cat job_tensorflow_gpu.sh
module load cuda/12.2.2 cudnn/8.9.7 tensorflow-cuda/2.61.1-12.2.2
python3.11 ./tensorflow.py
|
The following is an example of the execution result, which shows that the GPU is correctly recognized in lines 7 and 11.
(Note that if there is a problem with module load, etc., an error message will be output indicating that the GPU is not available, and the program will be executed on the CPU.)
$ cat -n job_gpu.sh.82608.out
1 2024-06-29 17:22:45.968019: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2 2024-06-29 17:22:49.594355: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
3 To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
4 2024-06-29 17:22:56.162167: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
5 /home/app/tensorflow-cuda/2.61.1-12.2.2/lib/python3.11/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
6 super().__init__(**kwargs)
7 2024-06-29 17:23:25.133024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 92922 MB memory: -> device: 0, name: NVIDIA H100, pci bus id: 0000:1c:00.0, compute capability: 9.0
8 Epoch 1/5
9 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
10 I0000 00:00:1719649409.144605 162 service.cc:145] XLA service 0x1457c4006e50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
11 I0000 00:00:1719649409.145165 162 service.cc:153] StreamExecutor device (0): NVIDIA H100, Compute Capability 9.0
12 2024-06-29 17:23:29.310888: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
13 2024-06-29 17:23:29.520149: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:465] Loaded cuDNN version 8907
14 I0000 00:00:1719649411.691730 162 device_compiler.h:188] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 599us/step - accuracy: 0.8555 - loss: 0.4873
16 Epoch 2/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 1s 509us/step - accuracy: 0.9545 - loss: 0.1553
18 Epoch 3/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 1s 501us/step - accuracy: 0.9672 - loss: 0.1108
20 Epoch 4/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 1s 514us/step - accuracy: 0.9752 - loss: 0.0862
22 Epoch 5/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 1s 511us/step - accuracy: 0.9771 - loss: 0.0715
24 313/313 - 1s - 3ms/step - accuracy: 0.9764 - loss: 0.0780
|
Running with CPU
TensorFlow is often run using a GPU, though, we will also show an example of batch job execution using CPU on node group A. (This is available on node group B and C)
The following job script is prepared and executed.
$ cat job_tensorflow_cpu.sh
module load gcc/8 tensorflow/2.61.1
python3.11 ./tensorflow.py
|
Below is an example of the execution result. You can see that the second and third lines indicate that the GPU is not available.
(As indicated in line 5, TensorFlow fully supporting the latest CPU features (AVX and AMX) is not installed in Genkai.)
$ cat -n job_cpu.sh.82557.out
1 2024-06-28 13:50:13.207241: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2 2024-06-28 13:50:13.258494: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
3 2024-06-28 13:50:14.557537: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
4 2024-06-28 13:50:16.895459: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
5 To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
6 2024-06-28 13:50:18.875574: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
7 Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step
9 /home/app/tensorflow/2.61.1/lib/python3.11/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
10 super().__init__(**kwargs)
11 Epoch 1/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 2s 756us/step - accuracy: 0.8573 - loss: 0.4889
13 Epoch 2/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 1s 695us/step - accuracy: 0.9562 - loss: 0.1487
15 Epoch 3/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 2s 820us/step - accuracy: 0.9673 - loss: 0.1079
17 Epoch 4/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 2s 869us/step - accuracy: 0.9737 - loss: 0.0850
19 Epoch 5/5
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 1s 731us/step - accuracy: 0.9785 - loss: 0.0711
21 313/313 - 0s - 619us/step - accuracy: 0.9767 - loss: 0.0792
|
TensorFlow with advanced support for AVX and AMX is now available by loading this module.
tensorflow-cpu/2.17.0