Skip to content

Job Usage

Last Updated: August 27, 2025


The computing nodes in the Genkai node groups A, B, and C are managed by a job management system that allocates resources in response to requests from multiple users, unlike the login nodes. Therefore, to run programs on these nodes, you must first submit a usage request in the form of a job. This document introduces how to use each node group through jobs.

Types of Jobs

In Genkai, you can execute the following two types of jobs:

Job Type

Usage

Batch Job

Batch execution by submitting a job with a pre-written script. Interactive use is not allowed.

Interactive Job

Interactive execution by logging into a computing node. Mainly intended for short-time debugging and pre/post-processing.

Additionally, batch jobs have the following three types:

Batch Job Type

Usage

Regular Job

Submits one script and executes one job.

Step Job

Submits multiple scripts as a single batch and executes them in a specified order.

Bulk Job

Submits one script and generates multiple regular jobs for execution.

Batch Job Flow

Generally, to run a program using a batch job, you follow these steps:

  • Job operation commands
  • Creating a job processing script
  • Submitting the job (pjsub command)
  • (If necessary) Checking the job status (pjstat command)
  • (If necessary) Deleting a running or waiting job (pjdel command)
  • Checking the results
  • Checking node usage

Job Operation Commands

The commands used for job operations are as follows:

Function

Command

Submit Job

pjsub

Check Job Status

pjstat

Delete Job

pjdel

Check Node Group Congestion

pjshowrsc

Job Submission (pjsub Command)

Jobs can be submitted in the following forms, depending on the format. Note that all forms must be submitted within the large-capacity storage (/home) or high-speed storage (/fast).

  • Batch Job Format
$ pjsub options job_script_file
  • Interactive Job Format
$ pjsub --interact options

Creating a Batch Processing Script

Basic Options

Option Name

Description

-r

Specify the ID specified in the reservation portal

-o filename

Output standard output to the file filename

-e filename

Output standard error to the file filename

-j

Output the job's standard error and standard output to the same file

–interact

Execute as an interactive job

–restart

Re-execute the job in case of a failure

–norestart

Do not re-execute the job in case of a failure (default)

–mail-list mailaddress

Specify the mail destination

-m

Specify mail notification

-m b

Mail notification at batch job start

-m e

Mail notification at batch job end

-m r

Mail notification at job re-execution

-X

Inherit environment variables at job submission to the job execution environment

Batch Job Resource Options

The main options regarding resources needed for batch job processing are as follows. Specify the resources or upper limits following -L.

Option Name

Description

-L rscgrp=name

Resource group (queue) name to which the job is submitted (for details, see Resource Groups)

-L node

Specify the number of nodes (mandatory when using more than one node)

-L vnode-core

Specify the number of cores (mandatory when using less than one node in node group A)

-L gpu

Specify the number of GPUs (mandatory when using less than one node in node groups B and C)

-L elapse

Specify the maximum job execution time

-L proc-core=

Specify the maximum core file size limit per process (default: 0, maximum: unlimited)

-L proc-data

Specify the maximum data segment size limit per process (default: unlimited)

-L proc-stack

Specify the maximum stack segment size limit per process. If set to unlimited, the actual value will be 2MiB due to RHEL specifications. (default: unlimited)

-L jobenv

Specify the job environment. If using Singularity, you must specify jobenv=singularity.

Statistics Output Options

Option Name

Description

-s

Output statistical information of the submitted job (cannot be used with the -S option)

-S

Output statistical information including node information of the submitted job (cannot be used with the -s option)

Example Job Scripts

Sequential Job for Node Group A

Below is an example job script to execute a job. This example assumes executing a program compiled with Intel oneAPI.

Resource Specification

Details

Resource Group

a-batch

Number of CPU Cores

1

Elapsed Time

1 hour

Output Standard Error to Standard Output

Yes

#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L vnode-core=1
#PJM -L elapse=1:00:00
#PJM -j

module load intel
./a.out

Thread Parallel Job for Node Group A

Below is an example job script to execute a job. This example assumes executing a program compiled with Intel oneAPI.

Resource Specification

Details

Resource Group

a-batch

Number of CPU Cores

30

Number of Threads

30

Elapsed Time

1 hour

Output Standard Error to Standard Output

Yes

#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L vnode-core=30
#PJM -L elapse=1:00:00
#PJM -j

module load intel
export OMP_NUM_THREADS=30
./a.out

Hybrid Parallel Job for Node Group A

Below is an example job script to execute a job. This example assumes executing a program compiled and linked with Intel oneAPI and Intel MPI.

Resource Specification

Details

Resource Group

a-batch

Number of Nodes

4

Number of Processes per Node

10

Number of Threads per Process

12

Elapsed Time

1 hour

Output Standard Error to Standard Output

Yes

#!/bin/sh
#PJM -L rscgrp=a-batch
#PJM -L node=4
#PJM -L elapse=1:00:00
#PJM -j

module load intel
module load impi
export OMP_NUM_THREADS=12
mpiexec -np 40 -ppn 10 ./a.out

1 GPU Job for Node Group B

Below is an example job script to execute a job. This example assumes executing a program compiled with CUDA.

Resource Specification

Details

Resource Group

b-batch

Number of GPUs

1

Elapsed Time

1 hour

Output Standard Error to Standard Output

Yes

#!/bin/sh
#PJM -L rscgrp=b-batch
#PJM -L gpu=1
#PJM -L elapse=1:00:00
#PJM -j

module load cuda
./a.out

2 Node Job for Node Group B

Below is an example job script to execute a job.

Resource Specification

Details

Resource Group

b-batch

Number of Nodes

2

Number of GPUs

8

Number of Processes per Node

4

Elapsed Time

1 hour

Output Standard Error to Standard Output

Yes

#!/bin/sh
#PJM -L rscgrp=b-batch
#PJM -L gpu=2
#PJM -L elapse=1:00:00
#PJM -j

module load gcc cuda cudnn nccl hpcx
mpiexec -n 8 -map-by ppr:4:node python3 pytorch_mnist.py

Executing an Interactive Job

To execute an interactive job, specify the --interact option with the pjsub command.
Below is an example of using the resource group a-inter with 1 node for 1 hour in an interactive job.

$ pjsub --interact -L rscgrp=a-inter,node=1,elapse=01:00:00
[INFO] PJM 0000 pjsub Job 82653 submitted.
[INFO] PJM 0081 .connected.
[INFO] PJM 0082 pjsub Interactive job 82653 started.
[ku01234567@a0001 test]$

Submitting a Batch Job

Request the processing described in the batch processing script file using the pjsub command.

$ pjsub go.sh
[INFO] PJM 0000 pjsub Job 1234 sunmitted.

In this example, the processing described in a file named go.sh is being requested. The example shows that the job ID 1234 has been assigned.

Checking Job Status

Checking the Status of Running and Waiting Jobs

To check the status of submitted jobs, use the pjstat command as follows.

$ pjstat
JOB_ID     JOB_NAME   MD ST  USER     START_DATE      ELAPSE_LIM            NODE_REQUIRE    VNODE  CORE V_MEM
82659      test2.sh   NM RNA ku400001 (07/01 15:25)   0000:20:00            8               -      -    -

Here, JOB_ID represents the job number, and ST represents the current state of the job. The main job states are as follows:

Display

State

QUE

Waiting

RNA

Starting

RUN

Running

RNO

Ending

There is a limit on the number of cores that can be used simultaneously for job execution. If a job is submitted that exceeds this limit, it will be placed in a waiting state regardless of the resource group’s current load. For details on this limit, see Limit on the Number of Simultaneously Used Cores.

Viewing History

You can check the execution history from a specified number of days ago (7 days in this example) to the present using the following option:

$ pjstat -H day=7 -v
  • The job end status is displayed in the "PC" column (0: Normal end, 1: Canceled, etc.).
  • For each code number, you can check "man pjstat".
  • If the job exceeded memory usage, "12" will be output in the "PC" column of the job history.
  • The history of past jobs is deleted after a certain period.

Statistical Information for Completed Batch Jobs

You can check the statistical information for any completed job by specifying its job ID (1234 in this example) using the following option:

$ pjstat -H -S 1234

Deleting a Batch Job

You can cancel (delete) running or waiting batch jobs using the pjdel command. Specify the job ID(s) after the pjdel command (multiple IDs can be specified). Canceling a running batch job will stop its execution.

$ pjdel 1234
[INFO] PJM 0100 pjdel Job 1234 canceled.

In this example, a request is made to delete the batch job with job ID 1234, and a message indicates that the job has been successfully deleted.

Checking the Results

If the output file is specified with the -o option of pjsub, the results to standard output are written to the specified file. If not specified, the output is written to a file named "[job_script_name].[job_ID].out". On the other hand, if the -j option is specified, standard error output is written to the same file as standard output. If the -j option is not specified, and an output file is specified with the -e option, the standard error output is written to the specified file. If neither option is specified, the standard error output is written to "[job_script_name].[job_ID].err".

Checking Resource Group Congestion

To check the congestion status of resource groups, use the pjshowrsc or show_rsc commands as follows.

$ pjshowrsc --rg
[ CLST: genkai-clst ]
[ RSCUNIT: rscunit_pg01 ]
RSCGRP           NODE
                 TOTAL   FREE  ALLOC
a-batch           1000    999      1
a-inter           1000    999      1
a-reserve           10     10      0
b-batch             34     34      0
b-batch-mig          3      3      0
b-inter             34     34      0
b-inter-mig          3      3      0
b-reserve            4      4      0
c-batch              2      2      0
c-inter              2      2      0
$ show_rsc
                                      node              core               gpu
rscgrp                   mode         free    total     free    total     free    total
a-batch/a-inter          simplex       800      800        -        -        -        -
a-batch/a-inter          shared        199      222    23990    26640        -        -
b-batch/b-inter          simplex        30       30        -        -        -        -
b-batch/b-inter          shared          4        4        -        -       16       16
b-batch-mig/b-inter-mig  simplex         1        1        -        -        -        -
b-batch-mig/b-inter-mig  shared          2        3        -        -       56       84
c-batch/c-inter          simplex         1        1        -        -        -        -
c-batch/c-inter          shared          1        1        -        -        8        8

Executing Step Jobs

A step job is a job model that treats multiple batch jobs as a single entity, specifying the order and dependencies among them to achieve job chaining functionality. Step jobs consist of multiple sub-jobs, and each sub-job is not executed simultaneously.

The submission format for step jobs is as follows.

$ pjsub --step [--sparam "sn=stepno[,Dependency_Expressions]"] jobscript

Step Job Dependency Expressions

Condition

Description

NONE

Indicates no dependencies

Exit status == value[,value,..]

Any value can be specified for value. In the case of "==" or "!=", multiple values can be specified using a comma (",").
Example:
ec==1,3,5 → True if the exit status is any of 1, 3, or 5.
ec!=1,3,5 → True if the exit status is none of 1, 3, or 5.

Exit status != value[,value,..]

Exit status > value

Exit status >= value

Exit status < value

Exit status <= value

Deletion Types Specifiable in Step Job Dependency Expressions

Deletion Type

Description

one

Deletes only the specified job.

after

Deletes the specified job and jobs dependent on it recursively.

all

Deletes the specified job and all subsequent jobs.

Executing Bulk Jobs

Bulk jobs are jobs that execute multiple identical batch jobs simultaneously. For example, if you want to change the job parameters and check each execution result, with a regular batch job, you would need to submit each job one by one. However, with a bulk job, you can submit multiple patterns at once.

The submission format for bulk jobs is as follows:

$ pjsub --bulk --sparam start-end jobscript

When executing a bulk job, you can change the program's input and output files for each sub-job using the bulk number assigned to each sub-job. The bulk number can be referenced with the environment variable PJM_BULKNUM.

Example

8 sub-jobs read from input files input.1 to input.8 and output to output files output.1 to output.8 respectively.

[username@genkai0001 ~]$ ls input.*
input.1 input 2 input.3 input.4 input.5 input.6 input.7 input.8
[username@genkai0001 ~]$ vi bulk.sh
#!/bin/sh
#------ pjsub option --------#
#PJM -L rscgrp=a-batch
#PJM -L node=1
#PJM -L elapse=1:00:00
#PJM -j
#------- Program execution -------#
./a.out < input.$PJM_BULKNUM > output.$PJM_BULKNUM
[username@genkai0001 ~]$ pjsub --bulk --sparam 1-8 bulk.sh
[INFO] PJM 0000 pjsub Job 12345 submitted.

For more details on using jobs in Genkai, please refer to the following:

Node Reservation System

Users can interactively reserve computing resources for Node Group A and Node Group B via a web interface for their desired time period. Within the user’s available time period, reservations can be made for any time slot starting from 30 minutes after the current time and extending up to 14 days in the future, in 30-minute increments. When submitting a reservation request, users can select the reservation time (in one-hour increments), the number of nodes to be used, and the number of GPUs. The available reservation times, number of nodes, and number of GPUs
are as follows.

Node Group A

Node Group B

Maximum Booking Duration

24 h

24 h

Number of Nodes

1–4

1 (fixed)

Number of GPUs

1–4

How to Use the Node Reservation Portal

Logging In to the Portal

  1. Access Node Reservation Portal in a web browser.

  2. Enter your portal account and password to log in.
    Your portal account is the one you created on the Usage Application Portal , and it is an account that begins with “ap”.

  3. Next, enter your one-time password.

Since the Node Reservation Portal and the Application Portal use the same login credentials, please enter the one-time password obtained using the same procedure as when logging into the Application Portal.
(If not much time has passed since logging into either portal, the entire login process will be skipped.)

Making a Reservation

  1. Select “New reservation” and enter the required information for the reservation.

Item

Description

Supercomputer account

Select a supercomputer account

Start at

Select the start date and time

Operating hour

Select the reservation time

Node Group

Select a node group

Resource

Select the amount of resources to reserve

  1. Click “Reserve.” A confirmation dialog will appear, and the reservation will be completed.

  2. You can view your reservation details under “History”.

  3. You can submit jobs once the reservation period begins. When running pjsub, specify the “Reservation ID” number with the -r option.
    (For details, please refer to the explanation of the basic options above.)

Creating and Running Reserved Jobs

Running Batch Jobs

Add -r [Reservation ID] to your standard batch job execution command.

Example:

$ pjsub -r [Reservation ID] job.sh

The following is an example of job.sh.
Do not include rscgrp to specify a resource group.

Example:

$ pjsub -r [Reservation ID] job.sh


The following is an example of `job.sh`.  
Do not include `rscgrp` to specify a resource group.

Example)
$ pjsub -r [Reservation ID] job.sh

!/bin/sh

Do not specify rscgrp

PJM -L vnode-core=60

PJM -L elapse=1:00:00

PJM -j

PJM -S

export OMP_NUM_THREADS=30 ./a.out

#### Running Interactive Jobs

As with batch jobs, do not include the `rscgrp` option to specify a resource group.

Example)
$ pjsub -r [Reservation ID] --interact -L node=1,elapse=01:00:00
### Modifying a Reservation

1. Select “Modify reservation” to modify the reservation.

| Item | Description |
|---|---|
| Reservation ID | Select the “Reservation ID” to modify |
| Resource | Select the resource quantity to modify |