Job Usage¶
Last Updated: August 27, 2025
The computing nodes in the Genkai node groups A, B, and C are managed by a job management system that allocates resources in response to requests from multiple users, unlike the login nodes. Therefore, to run programs on these nodes, you must first submit a usage request in the form of a job. This document introduces how to use each node group through jobs.
Types of Jobs¶
In Genkai, you can execute the following two types of jobs:
|
Job Type |
Usage |
|---|---|
|
Batch Job |
Batch execution by submitting a job with a pre-written script. Interactive use is not allowed. |
|
Interactive Job |
Interactive execution by logging into a computing node. Mainly intended for short-time debugging and pre/post-processing. |
Additionally, batch jobs have the following three types:
|
Batch Job Type |
Usage |
|---|---|
|
Regular Job |
Submits one script and executes one job. |
|
Step Job |
Submits multiple scripts as a single batch and executes them in a specified order. |
|
Bulk Job |
Submits one script and generates multiple regular jobs for execution. |
Batch Job Flow¶
Generally, to run a program using a batch job, you follow these steps:
- Job operation commands
- Creating a job processing script
- Submitting the job (
pjsubcommand) - (If necessary) Checking the job status (
pjstatcommand) - (If necessary) Deleting a running or waiting job (
pjdelcommand) - Checking the results
- Checking node usage
Job Operation Commands¶
The commands used for job operations are as follows:
|
Function |
Command |
|---|---|
|
Submit Job |
|
|
Check Job Status |
|
|
Delete Job |
|
|
Check Node Group Congestion |
|
Job Submission (pjsub Command)¶
Jobs can be submitted in the following forms, depending on the format. Note that all forms must be submitted within the large-capacity storage (/home) or high-speed storage (/fast).
- Batch Job Format
$ pjsub options job_script_file
- Interactive Job Format
$ pjsub --interact options
Creating a Batch Processing Script¶
Basic Options¶
|
Option Name |
Description |
|---|---|
|
-r |
Specify the ID specified in the reservation portal |
|
-o filename |
Output standard output to the file filename |
|
-e filename |
Output standard error to the file filename |
|
-j |
Output the job's standard error and standard output to the same file |
|
–interact |
Execute as an interactive job |
|
–restart |
Re-execute the job in case of a failure |
|
–norestart |
Do not re-execute the job in case of a failure (default) |
|
–mail-list mailaddress |
Specify the mail destination |
|
-m |
Specify mail notification |
|
-m b |
Mail notification at batch job start |
|
-m e |
Mail notification at batch job end |
|
-m r |
Mail notification at job re-execution |
|
-X |
Inherit environment variables at job submission to the job execution environment |
Batch Job Resource Options¶
The main options regarding resources needed for batch job processing are as follows. Specify the resources or upper limits following -L.
|
Option Name |
Description |
|---|---|
|
-L rscgrp=name |
Resource group (queue) name to which the job is submitted (for details, see Resource Groups) |
|
-L node |
Specify the number of nodes (mandatory when using more than one node) |
|
-L vnode-core |
Specify the number of cores (mandatory when using less than one node in node group A) |
|
-L gpu |
Specify the number of GPUs (mandatory when using less than one node in node groups B and C) |
|
-L elapse |
Specify the maximum job execution time |
|
-L proc-core= |
Specify the maximum core file size limit per process (default: 0, maximum: unlimited) |
|
-L proc-data |
Specify the maximum data segment size limit per process (default: unlimited) |
|
-L proc-stack |
Specify the maximum stack segment size limit per process. If set to unlimited, the actual value will be 2MiB due to RHEL specifications. (default: unlimited) |
|
-L jobenv |
Specify the job environment. If using Singularity, you must specify jobenv=singularity. |
Statistics Output Options¶
|
Option Name |
Description |
|---|---|
|
-s |
Output statistical information of the submitted job (cannot be used with the -S option) |
|
-S |
Output statistical information including node information of the submitted job (cannot be used with the -s option) |
Example Job Scripts¶
Sequential Job for Node Group A¶
Below is an example job script to execute a job. This example assumes executing a program compiled with Intel oneAPI.
|
Resource Specification |
Details |
|---|---|
|
Resource Group |
a-batch |
|
Number of CPU Cores |
1 |
|
Elapsed Time |
1 hour |
|
Output Standard Error to Standard Output |
Yes |
#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L vnode-core=1
#PJM -L elapse=1:00:00
#PJM -j
module load intel
./a.out
Thread Parallel Job for Node Group A¶
Below is an example job script to execute a job. This example assumes executing a program compiled with Intel oneAPI.
|
Resource Specification |
Details |
|---|---|
|
Resource Group |
a-batch |
|
Number of CPU Cores |
30 |
|
Number of Threads |
30 |
|
Elapsed Time |
1 hour |
|
Output Standard Error to Standard Output |
Yes |
#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L vnode-core=30
#PJM -L elapse=1:00:00
#PJM -j
module load intel
export OMP_NUM_THREADS=30
./a.out
Hybrid Parallel Job for Node Group A¶
Below is an example job script to execute a job. This example assumes executing a program compiled and linked with Intel oneAPI and Intel MPI.
|
Resource Specification |
Details |
|---|---|
|
Resource Group |
a-batch |
|
Number of Nodes |
4 |
|
Number of Processes per Node |
10 |
|
Number of Threads per Process |
12 |
|
Elapsed Time |
1 hour |
|
Output Standard Error to Standard Output |
Yes |
#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L node=4
#PJM -L elapse=1:00:00
#PJM -j
module load intel
module load impi
export OMP_NUM_THREADS=12
mpiexec -np 40 -ppn 10 ./a.out
1 GPU Job for Node Group B¶
Below is an example job script to execute a job. This example assumes executing a program compiled with CUDA.
|
Resource Specification |
Details |
|---|---|
|
Resource Group |
b-batch |
|
Number of GPUs |
1 |
|
Elapsed Time |
1 hour |
|
Output Standard Error to Standard Output |
Yes |
#!/bin/sh
#PJM -L rscgrp=b-batch
#PJM -L gpu=1
#PJM -L elapse=1:00:00
#PJM -j
module load cuda
./a.out
2 Node Job for Node Group B¶
Below is an example job script to execute a job.
|
Resource Specification |
Details |
|---|---|
|
Resource Group |
b-batch |
|
Number of Nodes |
2 |
|
Number of GPUs |
8 |
|
Number of Processes per Node |
4 |
|
Elapsed Time |
1 hour |
|
Output Standard Error to Standard Output |
Yes |
#!/bin/sh
#PJM -L rscgrp=b-batch
#PJM -L gpu=2
#PJM -L elapse=1:00:00
#PJM -j
module load gcc cuda cudnn nccl hpcx
mpiexec -n 8 -map-by ppr:4:node python3 pytorch_mnist.py
Executing an Interactive Job¶
To execute an interactive job, specify the --interact option with the pjsub command.
Below is an example of using the resource group a-inter with 1 node for 1 hour in an interactive job.
$ pjsub --interact -L rscgrp=a-inter,node=1,elapse=01:00:00
[INFO] PJM 0000 pjsub Job 82653 submitted.
[INFO] PJM 0081 .connected.
[INFO] PJM 0082 pjsub Interactive job 82653 started.
[ku01234567@a0001 test]$
Submitting a Batch Job¶
Request the processing described in the batch processing script file using the pjsub command.
$ pjsub go.sh
[INFO] PJM 0000 pjsub Job 1234 sunmitted.
In this example, the processing described in a file named go.sh is being requested. The example shows that the job ID 1234 has been assigned.
Checking Job Status¶
Checking the Status of Running and Waiting Jobs¶
To check the status of submitted jobs, use the pjstat command as follows.
$ pjstat
JOB_ID JOB_NAME MD ST USER START_DATE ELAPSE_LIM NODE_REQUIRE VNODE CORE V_MEM
82659 test2.sh NM RNA ku400001 (07/01 15:25) 0000:20:00 8 - - -
Here, JOB_ID represents the job number, and ST represents the current state of the job. The main job states are as follows:
|
Display |
State |
|---|---|
|
QUE |
Waiting |
|
RNA |
Starting |
|
RUN |
Running |
|
RNO |
Ending |
There is a limit on the number of cores that can be used simultaneously for job execution. If a job is submitted that exceeds this limit, it will be placed in a waiting state regardless of the resource group’s current load. For details on this limit, see Limit on the Number of Simultaneously Used Cores.
Viewing History¶
You can check the execution history from a specified number of days ago (7 days in this example) to the present using the following option:
$ pjstat -H day=7 -v
- The job end status is displayed in the "PC" column (0: Normal end, 1: Canceled, etc.).
- For each code number, you can check "
man pjstat". - If the job exceeded memory usage, "12" will be output in the "PC" column of the job history.
- The history of past jobs is deleted after a certain period.
Statistical Information for Completed Batch Jobs¶
You can check the statistical information for any completed job by specifying its job ID (1234 in this example) using the following option:
$ pjstat -H -S 1234
Deleting a Batch Job¶
You can cancel (delete) running or waiting batch jobs using the pjdel command. Specify the job ID(s) after the pjdel command (multiple IDs can be specified). Canceling a running batch job will stop its execution.
$ pjdel 1234
[INFO] PJM 0100 pjdel Job 1234 canceled.
In this example, a request is made to delete the batch job with job ID 1234, and a message indicates that the job has been successfully deleted.
Checking the Results¶
If the output file is specified with the -o option of pjsub, the results to standard output are written to the specified file. If not specified, the output is written to a file named "[job_script_name].[job_ID].out". On the other hand, if the -j option is specified, standard error output is written to the same file as standard output. If the -j option is not specified, and an output file is specified with the -e option, the standard error output is written to the specified file. If neither option is specified, the standard error output is written to "[job_script_name].[job_ID].err".
Checking Resource Group Congestion¶
To check the congestion status of resource groups, use the pjshowrsc or show_rsc commands as follows.
$ pjshowrsc --rg
[ CLST: genkai-clst ]
[ RSCUNIT: rscunit_pg01 ]
RSCGRP NODE
TOTAL FREE ALLOC
a-batch 1000 999 1
a-inter 1000 999 1
a-reserve 10 10 0
b-batch 34 34 0
b-batch-mig 3 3 0
b-inter 34 34 0
b-inter-mig 3 3 0
b-reserve 4 4 0
c-batch 2 2 0
c-inter 2 2 0
$ show_rsc
node core gpu
rscgrp mode free total free total free total
a-batch/a-inter simplex 800 800 - - - -
a-batch/a-inter shared 199 222 23990 26640 - -
b-batch/b-inter simplex 30 30 - - - -
b-batch/b-inter shared 4 4 - - 16 16
b-batch-mig/b-inter-mig simplex 1 1 - - - -
b-batch-mig/b-inter-mig shared 2 3 - - 56 84
c-batch/c-inter simplex 1 1 - - - -
c-batch/c-inter shared 1 1 - - 8 8
Executing Step Jobs¶
A step job is a job model that treats multiple batch jobs as a single entity, specifying the order and dependencies among them to achieve job chaining functionality. Step jobs consist of multiple sub-jobs, and each sub-job is not executed simultaneously.
The submission format for step jobs is as follows.
$ pjsub --step [--sparam "sn=stepno[,Dependency_Expressions]"] jobscript
Step Job Dependency Expressions¶
|
Condition |
Description |
|---|---|
|
NONE |
Indicates no dependencies |
|
Exit status == value[,value,..] |
Any value can be specified for value. In the case of "==" or "!=", multiple values can be specified using a comma (","). |
|
Exit status != value[,value,..] |
|
|
Exit status > value |
|
|
Exit status >= value |
|
|
Exit status < value |
|
|
Exit status <= value |
Deletion Types Specifiable in Step Job Dependency Expressions¶
|
Deletion Type |
Description |
|---|---|
|
one |
Deletes only the specified job. |
|
after |
Deletes the specified job and jobs dependent on it recursively. |
|
all |
Deletes the specified job and all subsequent jobs. |
Executing Bulk Jobs¶
Bulk jobs are jobs that execute multiple identical batch jobs simultaneously. For example, if you want to change the job parameters and check each execution result, with a regular batch job, you would need to submit each job one by one. However, with a bulk job, you can submit multiple patterns at once.
The submission format for bulk jobs is as follows:
$ pjsub --bulk --sparam start-end jobscript
When executing a bulk job, you can change the program's input and output files for each sub-job using the bulk number assigned to each sub-job. The bulk number can be referenced with the environment variable PJM_BULKNUM.
Example¶
8 sub-jobs read from input files input.1 to input.8 and output to output files output.1 to output.8 respectively.
[username@genkai0001 ~]$ ls input.*
input.1 input 2 input.3 input.4 input.5 input.6 input.7 input.8
[username@genkai0001 ~]$ vi bulk.sh
#!/bin/sh
#------ pjsub option --------#
#PJM -L rscgrp=a-batch
#PJM -L node=1
#PJM -L elapse=1:00:00
#PJM -j
#------- Program execution -------#
./a.out < input.$PJM_BULKNUM > output.$PJM_BULKNUM
[username@genkai0001 ~]$ pjsub --bulk --sparam 1-8 bulk.sh
[INFO] PJM 0000 pjsub Job 12345 submitted.
Related Information¶
For more details on using jobs in Genkai, please refer to the following:
Node Reservation System¶
Users can interactively reserve computing resources for Node Group A and Node Group B via a web interface for their desired time period. Within the user’s available time period, reservations can be made for any time slot starting from 30 minutes after the current time and extending up to 14 days in the future, in 30-minute increments. When submitting a reservation request, users can select the reservation time (in one-hour increments), the number of nodes to be used, and the number of GPUs. The available reservation times, number of nodes, and number of GPUs
are as follows.
|
Node Group A |
Node Group B |
|
|---|---|---|
|
Maximum Booking Duration |
24 h |
24 h |
|
Number of Nodes |
1–4 |
1 (fixed) |
|
Number of GPUs |
1–4 |
How to Use the Node Reservation Portal¶
Logging In to the Portal¶
-
Access Node Reservation Portal in a web browser.
-
Enter your portal account and password to log in.
Your portal account is the one you created on the Usage Application Portal , and it is an account that begins with “ap”. -
Next, enter your one-time password.
Since the Node Reservation Portal and the Application Portal use the same login credentials, please enter the one-time password obtained using the same procedure as when logging into the Application Portal.
(If not much time has passed since logging into either portal, the entire login process will be skipped.)
Making a Reservation¶
- Select “New reservation” and enter the required information for the reservation.
|
Item |
Description |
|---|---|
|
Supercomputer account |
Select a supercomputer account |
|
Start at |
Select the start date and time |
|
Operating hour |
Select the reservation time |
|
Node Group |
Select a node group |
|
Resource |
Select the amount of resources to reserve |
-
Click “Reserve.” A confirmation dialog will appear, and the reservation will be completed.
-
You can view your reservation details under “History”.
-
You can submit jobs once the reservation period begins. When running pjsub, specify the “Reservation ID” number with the -r option.
(For details, please refer to the explanation of the basic options above.)
Creating and Running Reserved Jobs¶
Running Batch Jobs¶
Add -r [Reservation ID] to your standard batch job execution command.
Example:
$ pjsub -r [Reservation ID] job.sh
The following is an example of job.sh.
Do not include rscgrp to specify a resource group.
Example:
$ pjsub -r [Reservation ID] job.sh
The following is an example of `job.sh`.
Do not include `rscgrp` to specify a resource group.
Example)
!/bin/sh¶
Do not specify rscgrp¶
PJM -L vnode-core=60¶
PJM -L elapse=1:00:00¶
PJM -j¶
PJM -S¶
export OMP_NUM_THREADS=30 ./a.out
#### Running Interactive Jobs
As with batch jobs, do not include the `rscgrp` option to specify a resource group.
Example)
### Modifying a Reservation
1. Select “Modify reservation” to modify the reservation.
| Item | Description |
|---|---|
| Reservation ID | Select the “Reservation ID” to modify |
| Resource | Select the resource quantity to modify |