ホーム > Home (English) > System > Genkai > Genkai Hardware

Genkai Hardware

Last Update 2024/09/18

Overview

Outline

GENKAI is a system consisting of multiple node groups with different characteristics, login nodes, two types of shared storage, and a group of peripheral devices connected by a high-speed network.
Compared to the previous system (ITO), the number of nodes has been reduced to approximately half, but performance has been greatly improved.


Overview image

Overview image


Overall performance

Amount of nodes1,064 nodes
Amount of racks32 racks
Theoretical performanceCPU FP64 7.76 PFLOPS
GPU FP64 5.63 PFLOPS
GPU FP16,BF16 TC(DeepLearning) 166.22 PFLOPS
Total memory amountMain memory 566,912 GiB
Device memory 15,568 GiB
InterconnectInfiniBand NDR 200/400 Gbps
Amount of shared storageHDD 55.2 PB
SSD 737.28 TB

Node Group A

Outline

Node group A is a group of computation nodes equipped with only CPUs as main computing devices.
High application performance can be expected with the latest generation CPUs,
the large number of nodes (1024 nodes) makes it suitable for large-scale distributed-memory parallel computing.


Hardware configuration

Model NumberFUJITSU Server PRIMERGY CX2550 M7
Compute nodeCPUIntel Xeon Platinum 8490H (Sapphire Rapids, 60 cores, 1.90 GHz - 3.50 GHz) × 2 sockets
MemoryDDR5 4800 MHz, 512 GiB (32GiB×8 ×2 sockets)
Theoretical performanceFP64 7,296 GFLOPS (3,648 GFLOPS×2 sockets)
Memory bandwidth614.2 GB/s (4800MHz×8Byte×8 channels×2 sockets)
Amount of nodes and cores1,024 nodes、122,880 cores
Theoretical performanceFP64 7.47 PFLOPS (3,648GF×2 sockets×1,024 nodes)
Total memory amount512 TiB
Total memory bandwidth629 TB/s
User local storage -
InterconnectInfiniBand NDR (200Gbps) × 1 port/node
Coolingwater

Hardware Configuration in compute node

compute node of Node Group A


Node Group B

Outline

Node Group B is a group of compute nodes equipped with CPUs and GPUs as the main computing devices.
It is equipped with high-speed H100 GPUs and is suitable for applications in the fields of AI and data science as well as numerical computation and simulation.

It also supports MIG, which divides the GPUs for use. Please consider using MIG for small-scale GPU calculations or practice using GPUs.


Hardware configuration

Model NumberFUJITSU Server PRIMERGY GX2560 M7
Compute nodeCPUIntel Xeon Platinum 8490H (Sapphire Rapids, 60 cores, 1.90 GHz - 3.50 GHz) × 2 sockets
GPUNVIDIA H100 (Hopper)×4 sockets
MemoryDDR5 4800 MHz, 1024 GiB (64GiB×8 ×2 sockets) HBM2e, 376 GiB (94GiB×4 sockets)
Theoretical performance (CPU)FP64 7,296 GFLOPS (3,648 GFLOPS×2 sockets)
Theoretical performance (GPU)FP64: 134.0 TFLOPS (33.5 TFLOPS×4 sockets)
FP64 (TC): 267.6 TFLOPS (66.9 TFLOPS×4 sockets)
FP16, BF16 (TC): 3,957.6 TFLOPS (989.4 TFLOPS×4 sockets)
Memory bandwidthCPU 614.40 GB/s (4800MHz×8Byte×8 channels×2 sockets)
GPU 9,584 GB/s (2,396 GB/s×4 sockets)
GPU-GPU connectionNVLink*18 (25GB/s/single lane in each direction, total 450 GB/s/18 lanes in each direction, total 900 GB/s/18 lanes in bi-direction)
CPU-GPU connectionPCIe Gen5 x16, total 128 GB/s in bi-direction
Amount of nodes and cores38nodes, 4,560 CPUcores (60cores*2sockets*38nodes) + 1,284,096 FP64 GPUcores (8,448cores*4sockets*38nodes)
Theoretical performanceCPU FP64: 277.25 TFLOPS (3,648GF×2 sockets×38 nodes)
GPU FP64: 5.09 PFLOPS (33.5TF×4 sockets×38 nodes)
GPU FP16, BF16 (TC): 150.39 PFLOPS (989.4TF×4 sockets×38 nodes)
Total memory amountHost memory 38.91 TiB (64GiB×8×2 sockets×38 nodes)
Device memory 14.29 TiB (94GiB×4 sockets×38 nodes)
Total memory bandwidthHost memory 23.35 TB/s (4800MHz×8Byte×8 channels×2 sockets×38 nodes)
Device memory 364.19 TB/s (2,396GB/s×4 sockets×38 nodes)
User local storageNVMe SSD 12.8 TB/node
InterconnectInfiniBand NDR (400Gbps)×2 ports/node
Coolingater

Hardware Configuration in compute node

compute node of Node Group B


Node Group C

Outline

Node Group C is a group of compute nodes with more GPUs and memory than Node Group B.
Although it has only 2 nodes, it may be effective for programs that are difficult to run or accelerate in other node groups.


Hardware configuration

Model NumberSupermicro GPU SuperServer SYS-821GE-TNHR
Compute nodeCPUIntel Xeon Platinum 8480+ (Sapphire Rapids, 56 コア, 2.00 GHz - 3.80 GHz) × 2 sockets
GPUNVIDIA H100 (Hopper)×8 sockets
MemoryDDR5 4400 MHz, 8 TiB (256GiB×16×2 sockets)
HBM3, 640 GiB (80GiB×8 sockets)
Theoretical performance (CPU)FP64 7,168 GFLOPS (3,584 GFLOPS×2 sockets)
Theoretical performance (GPU)FP64: 268.0 TFLOPS (33.5 TFLOPS×8 sockets)
FP64 (TC): 535.2 TFLOPS (66.9 TFLOPS×8 sockets)
FP16, BF16 (TC): 7,915.2 TFLOPS (989.4 TFLOPS×8 sockets)
Memory bandwidthCPU 563.2 GB/s (4400MHz×8Byte×8 channels×2 sockets)
GPU 26,816 GB/s (3,352 GB/s×8 sockets)
GPU-GPU connectionNVLink*18 (25GB/s/single lane in each direction, total 450 GB/s/18 lanes in each direction, total 900 GB/s/18 lanes in bi-direction)
CPU-GPU connectionPCIe Gen5 x16, total 128GB/s in bi-direction
Amount of nodes and cores2node, 240 CPUcores (60cores*2sockets*2nodes) + 135,168 FP64 GPUcores (8,448cores*8sockets*2nodes)
Theoretical performanceCPU FP64: 14.34 TFLOPS (3,584GF×2 sockets×2 nodes)
GPU FP64: 536 TFLOPS (33.5TF×8 sockets×2 nodes)
GPU FP16, BF16 (TC): 15.83 PFLOPS (989.4TF×8 sockets×2 nodes)
Total memory amountHost memory 16 TiB (256GiB×16×2 sockets×2 nodes)
Device memory 1,280 GiB (80GiB×8 sockets×2 nodes)
Total memory bandwidthHost memory 1.23 TB/s (4800MHz×8Byte×8 channels×2 sockets×2 nodes)
Device memory 53.63 TB/s (3,352GB/s×8 sockets×2 nodes)
User local storageNVMe SSD 15.3 TB/node
InterconnectInfiniBand NDR (400Gbps)×4 ports/node
Coolingwater

Hardware Configuration in compute node

compute node of Node Group C



Login Nodes

Outline

Two nodes are provided as login nodes, each equipped with only a CPU as the main computing device.
When connecting to SSH, the user is connected to one of the nodes by DNS round robin.


Hardware configuration

Model numberFUJITSU Sever PRIMERGY CX2530 M7
Compute nodeCPUIntel Xeon Platinum 8490H (Sapphire Rapids, 60 cores, 1.90 GHz - 3.50 GHz) × 2 sockets
MemoryDDR5 4800 MHz, 1024 GiB (64GiB × 8 × 2 sockets)
Amount of nodes and cores2 nodes、240 cores
Total memory amount2048 GiB
User local storage -
InterconnectInfiniBand NDR (200Gbps) × 1 port/node
Coolingair

Shared Storage

Outline

Shared storage includes large storage consisting of HDDs and fast storage consisting of SSDs.
Data is protected by RAID but is not backed up.
Since data may be lost in the event of an unforeseen accident or natural disaster, care should be taken to back up important data on your own.


Hardware configuration

MDS/MDT (Large storage)DDN ES400NVX2 × 2
MDS/MDT (per 1set)Amount of MDS4
Amount of MDT4
Drive1.92TB NVMe SSD, 20 (+1 spare)
RAID constructionRAID6 (8D + 2P)
Amount of inodeabout 11 billion
MDS/MDT(Fast storage)DDN ES400NVX2 × 1
MDS/MDT (per 1set)Amount of MDS4
Amount of MDT8
Drive3.84TB NVMe SSD, 20 (+1 spare)
RAID constructionRAID6 (8D + 2P)
Amount of inodeAbout 22.5 billion
OSS/OST(Large storage)DDN ES400NVX2 × 6 (shared with Large storage)
OSS/OST (per 1set)Amount of OSS4
Amount of OST32
HDD capacity9.21TB
RAID constructionRAID6 (8D + 2P)
Write Performance60GB/s
Read Performance70GB/s
OSS/OST(Fast storage)DDN ES400NVX2 × 6 (shared with Large storage)
OSS/OST (per 1set)Amount of OSS4
Amount of OST8
SSD capacity122.88TB
RAID constructionRAID6 (8D + 2P)
Write Performance60GB/s
Read Performance80GB/s
Meta data backup server1
S3 access server2
NFS access server1
Monitoring server1

Network

Each node/device of Genkai is connected by InfiniBand and Ethernet.

Computational nodes, shared storage, and login nodes are connected by a Fat Tree type network with full bisection bandwidth using high-speed InfiniBand.
Inter-node data communication and storage access during computation are performed by InfiniBand.
Access from the external network to the login nodes and connection from each node to the external network is done by Ethernet.

“net”work