Job Usage¶

Last Updated: August 27, 2025

The computing nodes in the Genkai node groups A, B, and C are managed by a job management system that allocates resources in response to requests from multiple users, unlike the login nodes. Therefore, to run programs on these nodes, you must first submit a usage request in the form of a job. This document introduces how to use each node group through jobs.

Types of Jobs¶

In Genkai, you can execute the following two types of jobs:

Job Type	Usage
Batch Job	Batch execution by submitting a job with a pre-written script. Interactive use is not allowed.
Interactive Job	Interactive execution by logging into a computing node. Mainly intended for short-time debugging and pre/post-processing.

Additionally, batch jobs have the following three types:

Batch Job Type	Usage
Regular Job	Submits one script and executes one job.
Step Job	Submits multiple scripts as a single batch and executes them in a specified order.
Bulk Job	Submits one script and generates multiple regular jobs for execution.

Batch Job Flow¶

Generally, to run a program using a batch job, you follow these steps:

Job operation commands
Creating a job processing script
Submitting the job (pjsub command)
(If necessary) Checking the job status (pjstat command)
(If necessary) Deleting a running or waiting job (pjdel command)
Checking the results
Checking node usage

Job Operation Commands¶

The commands used for job operations are as follows:

Function	Command
Submit Job	`pjsub`
Check Job Status	`pjstat`
Delete Job	`pjdel`
Check Node Group Congestion	`pjshowrsc`

Job Submission (`pjsub` Command)¶

Jobs can be submitted in the following forms, depending on the format. Note that all forms must be submitted within the large-capacity storage (/home) or high-speed storage (/fast).

Batch Job Format

$ pjsub options job_script_file

Interactive Job Format

$ pjsub --interact options

Creating a Batch Processing Script¶

Basic Options¶

Option Name	Description
-r	Specify the ID specified in the reservation portal
-o filename	Output standard output to the file filename
-e filename	Output standard error to the file filename
-j	Output the job's standard error and standard output to the same file
–interact	Execute as an interactive job
–restart	Re-execute the job in case of a failure
–norestart	Do not re-execute the job in case of a failure (default)
–mail-list mailaddress	Specify the mail destination
-m	Specify mail notification
-m b	Mail notification at batch job start
-m e	Mail notification at batch job end
-m r	Mail notification at job re-execution
-X	Inherit environment variables at job submission to the job execution environment

Batch Job Resource Options¶

The main options regarding resources needed for batch job processing are as follows. Specify the resources or upper limits following -L.

Option Name	Description
-L rscgrp=name	Resource group (queue) name to which the job is submitted (for details, see Resource Groups)
-L node	Specify the number of nodes (mandatory when using more than one node)
-L vnode-core	Specify the number of cores (mandatory when using less than one node in node group A)
-L gpu	Specify the number of GPUs (mandatory when using less than one node in node groups B and C)
-L elapse	Specify the maximum job execution time
-L proc-core=	Specify the maximum core file size limit per process (default: 0, maximum: unlimited)
-L proc-data	Specify the maximum data segment size limit per process (default: unlimited)
-L proc-stack	Specify the maximum stack segment size limit per process. If set to unlimited, the actual value will be 2MiB due to RHEL specifications. (default: unlimited)
-L jobenv	Specify the job environment. If using Singularity, you must specify jobenv=singularity.

Statistics Output Options¶

Option Name	Description
-s	Output statistical information of the submitted job (cannot be used with the -S option)
-S	Output statistical information including node information of the submitted job (cannot be used with the -s option)

Example Job Scripts¶

Sequential Job for Node Group A¶

Below is an example job script to execute a job. This example assumes executing a program compiled with Intel oneAPI.

Resource Specification	Details
Resource Group	a-batch
Number of CPU Cores	1
Elapsed Time	1 hour
Output Standard Error to Standard Output	Yes

#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L vnode-core=1
#PJM -L elapse=1:00:00
#PJM -j

module load intel
./a.out

Thread Parallel Job for Node Group A¶

Below is an example job script to execute a job. This example assumes executing a program compiled with Intel oneAPI.

Resource Specification	Details
Resource Group	a-batch
Number of CPU Cores	30
Number of Threads	30
Elapsed Time	1 hour
Output Standard Error to Standard Output	Yes

#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L vnode-core=30
#PJM -L elapse=1:00:00
#PJM -j

module load intel
export OMP_NUM_THREADS=30
./a.out

Hybrid Parallel Job for Node Group A¶

Below is an example job script to execute a job. This example assumes executing a program compiled and linked with Intel oneAPI and Intel MPI.

Resource Specification	Details
Resource Group	a-batch
Number of Nodes	4
Number of Processes per Node	10
Number of Threads per Process	12
Elapsed Time	1 hour
Output Standard Error to Standard Output	Yes

#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L node=4
#PJM -L elapse=1:00:00
#PJM -j

module load intel
module load impi
export OMP_NUM_THREADS=12
mpiexec -np 40 -ppn 10 ./a.out

1 GPU Job for Node Group B¶

Below is an example job script to execute a job. This example assumes executing a program compiled with CUDA.

Resource Specification	Details
Resource Group	b-batch
Number of GPUs	1
Elapsed Time	1 hour
Output Standard Error to Standard Output	Yes

#!/bin/sh
#PJM -L rscgrp=b-batch
#PJM -L gpu=1
#PJM -L elapse=1:00:00
#PJM -j

module load cuda
./a.out

2 Node Job for Node Group B¶

Below is an example job script to execute a job.

Resource Specification	Details
Resource Group	b-batch
Number of Nodes	2
Number of GPUs	8
Number of Processes per Node	4
Elapsed Time	1 hour
Output Standard Error to Standard Output	Yes

#!/bin/sh
#PJM -L rscgrp=b-batch
#PJM -L gpu=2
#PJM -L elapse=1:00:00
#PJM -j

module load gcc cuda cudnn nccl hpcx
mpiexec -n 8 -map-by ppr:4:node python3 pytorch_mnist.py

Executing an Interactive Job¶

To execute an interactive job, specify the --interact option with the pjsub command.
Below is an example of using the resource group a-inter with 1 node for 1 hour in an interactive job.

$ pjsub --interact -L rscgrp=a-inter,node=1,elapse=01:00:00
[INFO] PJM 0000 pjsub Job 82653 submitted.
[INFO] PJM 0081 .connected.
[INFO] PJM 0082 pjsub Interactive job 82653 started.
[ku01234567@a0001 test]$

Submitting a Batch Job¶

Request the processing described in the batch processing script file using the pjsub command.

$ pjsub go.sh
[INFO] PJM 0000 pjsub Job 1234 sunmitted.

In this example, the processing described in a file named go.sh is being requested. The example shows that the job ID 1234 has been assigned.

Checking Job Status¶

Checking the Status of Running and Waiting Jobs¶

To check the status of submitted jobs, use the pjstat command as follows.

$ pjstat
JOB_ID     JOB_NAME   MD ST  USER     START_DATE      ELAPSE_LIM            NODE_REQUIRE    VNODE  CORE V_MEM
82659      test2.sh   NM RNA ku400001 (07/01 15:25)   0000:20:00            8               -      -    -

Here, JOB_ID represents the job number, and ST represents the current state of the job. The main job states are as follows:

Display	State
QUE	Waiting
RNA	Starting
RUN	Running
RNO	Ending

There is a limit on the number of cores that can be used simultaneously for job execution. If a job is submitted that exceeds this limit, it will be placed in a waiting state regardless of the resource group’s current load. For details on this limit, see Limit on the Number of Simultaneously Used Cores.

Viewing History¶

You can check the execution history from a specified number of days ago (7 days in this example) to the present using the following option:

$ pjstat -H day=7 -v

The job end status is displayed in the "PC" column (0: Normal end, 1: Canceled, etc.).
For each code number, you can check "man pjstat".
If the job exceeded memory usage, "12" will be output in the "PC" column of the job history.
The history of past jobs is deleted after a certain period.

Statistical Information for Completed Batch Jobs¶

You can check the statistical information for any completed job by specifying its job ID (1234 in this example) using the following option:

$ pjstat -H -S 1234

Deleting a Batch Job¶

You can cancel (delete) running or waiting batch jobs using the pjdel command. Specify the job ID(s) after the pjdel command (multiple IDs can be specified). Canceling a running batch job will stop its execution.

$ pjdel 1234
[INFO] PJM 0100 pjdel Job 1234 canceled.

In this example, a request is made to delete the batch job with job ID 1234, and a message indicates that the job has been successfully deleted.

Checking the Results¶

If the output file is specified with the -o option of pjsub, the results to standard output are written to the specified file. If not specified, the output is written to a file named "[job_script_name].[job_ID].out". On the other hand, if the -j option is specified, standard error output is written to the same file as standard output. If the -j option is not specified, and an output file is specified with the -e option, the standard error output is written to the specified file. If neither option is specified, the standard error output is written to "[job_script_name].[job_ID].err".

Checking Resource Group Congestion¶

To check the congestion status of resource groups, use the pjshowrsc or show_rsc commands as follows.

$ pjshowrsc --rg
[ CLST: genkai-clst ]
[ RSCUNIT: rscunit_pg01 ]
RSCGRP           NODE
                 TOTAL   FREE  ALLOC
a-batch           1000    999      1
a-inter           1000    999      1
a-reserve           10     10      0
b-batch             34     34      0
b-batch-mig          3      3      0
b-inter             34     34      0
b-inter-mig          3      3      0
b-reserve            4      4      0
c-batch              2      2      0
c-inter              2      2      0

$ show_rsc
                                      node              core               gpu
rscgrp                   mode         free    total     free    total     free    total
a-batch/a-inter          simplex       800      800        -        -        -        -
a-batch/a-inter          shared        199      222    23990    26640        -        -
b-batch/b-inter          simplex        30       30        -        -        -        -
b-batch/b-inter          shared          4        4        -        -       16       16
b-batch-mig/b-inter-mig  simplex         1        1        -        -        -        -
b-batch-mig/b-inter-mig  shared          2        3        -        -       56       84
c-batch/c-inter          simplex         1        1        -        -        -        -
c-batch/c-inter          shared          1        1        -        -        8        8

Executing Step Jobs¶

A step job is a job model that treats multiple batch jobs as a single entity, specifying the order and dependencies among them to achieve job chaining functionality. Step jobs consist of multiple sub-jobs, and each sub-job is not executed simultaneously.

The submission format for step jobs is as follows.

$ pjsub --step [--sparam "sn=stepno[,Dependency_Expressions]"] jobscript

Step Job Dependency Expressions¶

Condition	Description
NONE	Indicates no dependencies
Exit status == value[,value,..]	Any value can be specified for value. In the case of "==" or "!=", multiple values can be specified using a comma (","). Example: ec==1,3,5 → True if the exit status is any of 1, 3, or 5. ec!=1,3,5 → True if the exit status is none of 1, 3, or 5.
Exit status != value[,value,..]
Exit status > value
Exit status >= value
Exit status < value
Exit status <= value

Deletion Types Specifiable in Step Job Dependency Expressions¶

Deletion Type	Description
one	Deletes only the specified job.
after	Deletes the specified job and jobs dependent on it recursively.
all	Deletes the specified job and all subsequent jobs.

Executing Bulk Jobs¶

Bulk jobs are jobs that execute multiple identical batch jobs simultaneously. For example, if you want to change the job parameters and check each execution result, with a regular batch job, you would need to submit each job one by one. However, with a bulk job, you can submit multiple patterns at once.

The submission format for bulk jobs is as follows:

$ pjsub --bulk --sparam start-end jobscript

When executing a bulk job, you can change the program's input and output files for each sub-job using the bulk number assigned to each sub-job. The bulk number can be referenced with the environment variable PJM_BULKNUM.

Example¶

8 sub-jobs read from input files input.1 to input.8 and output to output files output.1 to output.8 respectively.

[username@genkai0001 ~]$ ls input.*
input.1 input 2 input.3 input.4 input.5 input.6 input.7 input.8
[username@genkai0001 ~]$ vi bulk.sh
#!/bin/sh
#------ pjsub option --------#
#PJM -L rscgrp=a-batch
#PJM -L node=1
#PJM -L elapse=1:00:00
#PJM -j
#------- Program execution -------#
./a.out < input.$PJM_BULKNUM > output.$PJM_BULKNUM
[username@genkai0001 ~]$ pjsub --bulk --sparam 1-8 bulk.sh
[INFO] PJM 0000 pjsub Job 12345 submitted.

For more details on using jobs in Genkai, please refer to the following:

How to Use the Job Management System (Technical Computing Suite)

Node Reservation System¶

Users can interactively reserve computing resources for Node Group A and Node Group B via a web interface for their desired time period. Within the user’s available time period, reservations can be made for any time slot starting from 30 minutes after the current time and extending up to 14 days in the future, in 30-minute increments. When submitting a reservation request, users can select the reservation time (in one-hour increments), the number of nodes to be used, and the number of GPUs. The available reservation times, number of nodes, and number of GPUs
are as follows.

	Node Group A	Node Group B
Maximum Booking Duration	24 h	24 h
Number of Nodes	1–4	1 (fixed)
Number of GPUs		1–4

How to Use the Node Reservation Portal¶

Logging In to the Portal¶

Access Node Reservation Portal in a web browser.
Enter your portal account and password to log in.
Your portal account is the one you created on the Usage Application Portal , and it is an account that begins with “ap”.
Next, enter your one-time password.

Since the Node Reservation Portal and the Application Portal use the same login credentials, please enter the one-time password obtained using the same procedure as when logging into the Application Portal.
(If not much time has passed since logging into either portal, the entire login process will be skipped.)

Making a Reservation¶

Select “New reservation” and enter the required information for the reservation.

Item	Description
Supercomputer account	Select a supercomputer account
Start at	Select the start date and time
Operating hour	Select the reservation time
Node Group	Select a node group
Resource	Select the amount of resources to reserve

Click “Reserve.” A confirmation dialog will appear, and the reservation will be completed.
You can view your reservation details under “History”.
You can submit jobs once the reservation period begins. When running pjsub, specify the “Reservation ID” number with the -r option.
(For details, please refer to the explanation of the basic options above.)

Creating and Running Reserved Jobs¶

Running Batch Jobs¶

Add -r [Reservation ID] to your standard batch job execution command.

Example:

$ pjsub -r [Reservation ID] job.sh

The following is an example of job.sh.
Do not include rscgrp to specify a resource group.