Genkai Hardware
Last Update 2024/09/18Overview
Outline
GENKAI is a system consisting of multiple node groups with different characteristics, login nodes, two types of shared storage, and a group of peripheral devices connected by a high-speed network.
Compared to the previous system (ITO), the number of nodes has been reduced to approximately half, but performance has been greatly improved.
Overview image
Overall performance
Amount of nodes | 1,064 nodes |
---|---|
Amount of racks | 32 racks |
Theoretical performance | CPU FP64 7.76 PFLOPS GPU FP64 5.63 PFLOPS GPU FP16,BF16 TC(DeepLearning) 166.22 PFLOPS |
Total memory amount | Main memory 566,912 GiB Device memory 15,568 GiB |
Interconnect | InfiniBand NDR 200/400 Gbps |
Amount of shared storage | HDD 55.2 PB SSD 737.28 TB |
Node Group A
Outline
Node group A is a group of computation nodes equipped with only CPUs as main computing devices.
High application performance can be expected with the latest generation CPUs,
the large number of nodes (1024 nodes) makes it suitable for large-scale distributed-memory parallel computing.
Hardware configuration
Model Number | FUJITSU Server PRIMERGY CX2550 M7 | |
---|---|---|
Compute node | CPU | Intel Xeon Platinum 8490H (Sapphire Rapids, 60 cores, 1.90 GHz - 3.50 GHz) × 2 sockets |
Memory | DDR5 4800 MHz, 512 GiB (32GiB×8 ×2 sockets) | |
Theoretical performance | FP64 7,296 GFLOPS (3,648 GFLOPS×2 sockets) | |
Memory bandwidth | 614.2 GB/s (4800MHz×8Byte×8 channels×2 sockets) | |
Amount of nodes and cores | 1,024 nodes、122,880 cores | |
Theoretical performance | FP64 7.47 PFLOPS (3,648GF×2 sockets×1,024 nodes) | |
Total memory amount | 512 TiB | |
Total memory bandwidth | 629 TB/s | |
User local storage | - | |
Interconnect | InfiniBand NDR (200Gbps) × 1 port/node | |
Cooling | water |
Hardware Configuration in compute node
Node Group B
Outline
Node Group B is a group of compute nodes equipped with CPUs and GPUs as the main computing devices.
It is equipped with high-speed H100 GPUs and is suitable for applications in the fields of AI and data science as well as numerical computation and simulation.
It also supports MIG, which divides the GPUs for use. Please consider using MIG for small-scale GPU calculations or practice using GPUs.
Hardware configuration
Model Number | FUJITSU Server PRIMERGY GX2560 M7 | |
---|---|---|
Compute node | CPU | Intel Xeon Platinum 8490H (Sapphire Rapids, 60 cores, 1.90 GHz - 3.50 GHz) × 2 sockets |
GPU | NVIDIA H100 (Hopper)×4 sockets | |
Memory | DDR5 4800 MHz, 1024 GiB (64GiB×8 ×2 sockets) HBM2e, 376 GiB (94GiB×4 sockets) | |
Theoretical performance (CPU) | FP64 7,296 GFLOPS (3,648 GFLOPS×2 sockets) | |
Theoretical performance (GPU) | FP64: 134.0 TFLOPS (33.5 TFLOPS×4 sockets) FP64 (TC): 267.6 TFLOPS (66.9 TFLOPS×4 sockets) FP16, BF16 (TC): 3,957.6 TFLOPS (989.4 TFLOPS×4 sockets) | |
Memory bandwidth | CPU 614.40 GB/s (4800MHz×8Byte×8 channels×2 sockets) GPU 9,584 GB/s (2,396 GB/s×4 sockets) | |
GPU-GPU connection | NVLink*18 (25GB/s/single lane in each direction, total 450 GB/s/18 lanes in each direction, total 900 GB/s/18 lanes in bi-direction) | |
CPU-GPU connection | PCIe Gen5 x16, total 128 GB/s in bi-direction | |
Amount of nodes and cores | 38nodes, 4,560 CPUcores (60cores*2sockets*38nodes) + 1,284,096 FP64 GPUcores (8,448cores*4sockets*38nodes) | |
Theoretical performance | CPU FP64: 277.25 TFLOPS (3,648GF×2 sockets×38 nodes) GPU FP64: 5.09 PFLOPS (33.5TF×4 sockets×38 nodes) GPU FP16, BF16 (TC): 150.39 PFLOPS (989.4TF×4 sockets×38 nodes) | |
Total memory amount | Host memory 38.91 TiB (64GiB×8×2 sockets×38 nodes) Device memory 14.29 TiB (94GiB×4 sockets×38 nodes) | |
Total memory bandwidth | Host memory 23.35 TB/s (4800MHz×8Byte×8 channels×2 sockets×38 nodes) Device memory 364.19 TB/s (2,396GB/s×4 sockets×38 nodes) | |
User local storage | NVMe SSD 12.8 TB/node | |
Interconnect | InfiniBand NDR (400Gbps)×2 ports/node | |
Cooling | ater |
Hardware Configuration in compute node
Node Group C
Outline
Node Group C is a group of compute nodes with more GPUs and memory than Node Group B.
Although it has only 2 nodes, it may be effective for programs that are difficult to run or accelerate in other node groups.
Hardware configuration
Model Number | Supermicro GPU SuperServer SYS-821GE-TNHR | |
---|---|---|
Compute node | CPU | Intel Xeon Platinum 8480+ (Sapphire Rapids, 56 コア, 2.00 GHz - 3.80 GHz) × 2 sockets |
GPU | NVIDIA H100 (Hopper)×8 sockets | |
Memory | DDR5 4400 MHz, 8 TiB (256GiB×16×2 sockets) HBM3, 640 GiB (80GiB×8 sockets) | |
Theoretical performance (CPU) | FP64 7,168 GFLOPS (3,584 GFLOPS×2 sockets) | |
Theoretical performance (GPU) | FP64: 268.0 TFLOPS (33.5 TFLOPS×8 sockets) FP64 (TC): 535.2 TFLOPS (66.9 TFLOPS×8 sockets) FP16, BF16 (TC): 7,915.2 TFLOPS (989.4 TFLOPS×8 sockets) | |
Memory bandwidth | CPU 563.2 GB/s (4400MHz×8Byte×8 channels×2 sockets) GPU 26,816 GB/s (3,352 GB/s×8 sockets) | |
GPU-GPU connection | NVLink*18 (25GB/s/single lane in each direction, total 450 GB/s/18 lanes in each direction, total 900 GB/s/18 lanes in bi-direction) | |
CPU-GPU connection | PCIe Gen5 x16, total 128GB/s in bi-direction | |
Amount of nodes and cores | 2node, 240 CPUcores (60cores*2sockets*2nodes) + 135,168 FP64 GPUcores (8,448cores*8sockets*2nodes) | |
Theoretical performance | CPU FP64: 14.34 TFLOPS (3,584GF×2 sockets×2 nodes) GPU FP64: 536 TFLOPS (33.5TF×8 sockets×2 nodes) GPU FP16, BF16 (TC): 15.83 PFLOPS (989.4TF×8 sockets×2 nodes) | |
Total memory amount | Host memory 16 TiB (256GiB×16×2 sockets×2 nodes) Device memory 1,280 GiB (80GiB×8 sockets×2 nodes) | |
Total memory bandwidth | Host memory 1.23 TB/s (4800MHz×8Byte×8 channels×2 sockets×2 nodes) Device memory 53.63 TB/s (3,352GB/s×8 sockets×2 nodes) | |
User local storage | NVMe SSD 15.3 TB/node | |
Interconnect | InfiniBand NDR (400Gbps)×4 ports/node | |
Cooling | water |
Hardware Configuration in compute node
Login Nodes
Outline
Two nodes are provided as login nodes, each equipped with only a CPU as the main computing device.
When connecting to SSH, the user is connected to one of the nodes by DNS round robin.
Hardware configuration
Model number | FUJITSU Sever PRIMERGY CX2530 M7 | |
---|---|---|
Compute node | CPU | Intel Xeon Platinum 8490H (Sapphire Rapids, 60 cores, 1.90 GHz - 3.50 GHz) × 2 sockets |
Memory | DDR5 4800 MHz, 1024 GiB (64GiB × 8 × 2 sockets) | |
Amount of nodes and cores | 2 nodes、240 cores | |
Total memory amount | 2048 GiB | |
User local storage | - | |
Interconnect | InfiniBand NDR (200Gbps) × 1 port/node | |
Cooling | air |
Shared Storage
Outline
Shared storage includes large storage consisting of HDDs and fast storage consisting of SSDs.
Data is protected by RAID but is not backed up.
Since data may be lost in the event of an unforeseen accident or natural disaster, care should be taken to back up important data on your own.
Hardware configuration
MDS/MDT (Large storage) | DDN ES400NVX2 × 2 | |
---|---|---|
MDS/MDT (per 1set) | Amount of MDS | 4 |
Amount of MDT | 4 | |
Drive | 1.92TB NVMe SSD, 20 (+1 spare) | |
RAID construction | RAID6 (8D + 2P) | |
Amount of inode | about 11 billion | |
MDS/MDT(Fast storage) | DDN ES400NVX2 × 1 | |
MDS/MDT (per 1set) | Amount of MDS | 4 |
Amount of MDT | 8 | |
Drive | 3.84TB NVMe SSD, 20 (+1 spare) | |
RAID construction | RAID6 (8D + 2P) | |
Amount of inode | About 22.5 billion | |
OSS/OST(Large storage) | DDN ES400NVX2 × 6 (shared with Large storage) | |
---|---|---|
OSS/OST (per 1set) | Amount of OSS | 4 |
Amount of OST | 32 | |
HDD capacity | 9.21TB | |
RAID construction | RAID6 (8D + 2P) | |
Write Performance | 60GB/s | |
Read Performance | 70GB/s | |
OSS/OST(Fast storage) | DDN ES400NVX2 × 6 (shared with Large storage) | |
OSS/OST (per 1set) | Amount of OSS | 4 |
Amount of OST | 8 | |
SSD capacity | 122.88TB | |
RAID construction | RAID6 (8D + 2P) | |
Write Performance | 60GB/s | |
Read Performance | 80GB/s |
Meta data backup server | 1 | |
---|---|---|
S3 access server | 2 | |
NFS access server | 1 | |
Monitoring server | 1 |
Network
Each node/device of Genkai is connected by InfiniBand and Ethernet.
Computational nodes, shared storage, and login nodes are connected by a Fat Tree type network with full bisection bandwidth using high-speed InfiniBand.
Inter-node data communication and storage access during computation are performed by InfiniBand.
Access from the external network to the login nodes and connection from each node to the external network is done by Ethernet.