FUJITSU Compiler

FUJITSU Compiler is a compiler included in FUJITSU Software Technical Computing Suite (TCS). This page introduces how to compile Fortran / C / C++ programs and execute them interactively or as batch jobs.

Availability

System / Availability

Subsystem A

Staff and Students of Kyushu Univ.	Staff and Students of Academic Organization	Non-Academic
OK	OK	OK

Subsystem B

Staff and Students of Kyushu Univ.	Staff and Students of Academic Organization	Non-Academic
OK	OK	OK

Commands

Commands for compilation and linkage are as follows:

command [option] file [...]

	Language	Command	Auto-Parallelization*	OpenMP*
Parallel without MPI	Fortran	`frt`	`-Kparallel`	`-Kopenmp`
	C	`fcc`
	C++	`FCC`
Parallel with MPI	Fortran	`mpifrt`
	C	`mpifcc`
	C++	`mpiFCC`
Data parallel	XPFortran	xpfrt

* Auto-parallelization and OpenMP are disabled by default.

Compilation and Linkage

C/C++

Use fcc / FCC command for compiling and linking C/C++ programs. To compile parallel programs with MPI, use mpifcc / mpiFCC command. (Examples below show the cases with fcc and mpifcc. For compiling C++ programs, use FCC and mpiFCC, instead.

Example 1) Serial C program

$ fcc -Kfast sample.c

Example 2) C program with auto-parallelization

$ fcc -Kfast,parallel sample.c

Example 3) Thread-parallel C program with OpenMP

$ fcc -Kfast,openmp sample.c

Example 4) Process parallel C program with MPI

$ mpifcc -Kfast sample.c

Example 5) Hybrid parallel C program with OpenMP and MPI

$ mpifcc -Kfast,openmp sample.c

Fortran

Use frt command for compiling and linking Fortran programs. To compile parallel programs with MPI, use mpifrt command.

Example 1) Serial Fortran program

$ frt -Kfast sample.f90

Example 2) Fortran program with auto-parallelization

$ frt -Kfast,parallel sample.f90

Example 3) Thread parallel Fortran program with OpenMP

$ frt -Kfast,openmp sample.f90

Example 4) Process parallel Fortran program with MPI

$ mpifrt -Kfast sample.f90

Example 5) Hybrid parallel Fortran program with OpenMP and MPI

$ mpifrt -Kfast,openmp sample.f90

Frequently Used Options

Option	Description
-c	Creates object files only.
-o `exe_file`	Specifies name of the executable file. "a.out" is used by default.
-O[0, 1, 2, 3]	Specifies the level of optimization. Default level is set as follows: Fortran: If the -O option is specified without level, -O3 is set. If the -O option is not specified, -O2 is set. C/C++: If the -O option is specified without level, -O2 is set. If the -O option is not specified, -O0 is set.
-Kfast	Applies the following optimization options. (Recommended) Fortran: -O3 -Keval,fp_relaxed,mfunc,ns,omitfp,sse{,fp_contract} C/C++: -O3 -Keval,fast_matmul,fp_relaxed,lib,mfunc,ns,omitfp,rdconv,sse{,fp_contract} -x-
-Kparallel	Applies automatic parallelization. (Default: -Knoparallel)
-Kopenmp	Enables OpenMP. (Default: -Knoopenmp)
-Kocl	Recognizes Optimization Control Lines (OCL) (Default: -Knoocl)
-I `directory`	Specifies directory to search for header files and module files.
-Fixed (Fortran only)	Specifies that the source codes are written in fixed form.
-Free (Fortran only)	Specifies that the source codes are written in free form.
-Fwide (Fortran only)	Specifies that the source codes are written in fiexed form with the maximum line length of 255.
-fw (Fortran only)	Outputs messages of warnings and severe errors.
-fs (Fortran only)	Outputs messages of severe errors only.
-w (C/C++ only)	Outputs messages of severe errors only.
-Haefosux (Fortran only)	Enables detailed check at compile time and runtime, such as correctness of the arguments of subprograms, shapes and subscripts of arrays, references of non-initialized variables, and so on.
-Koptmsg[=level]	Outputs messages about status and guidance of optimizations. -Koptmsg=1: Reports on the optimizations that can cause side-effects -Koptmsg=2: In addition to the output of optmsg=1, outputs messages about optimizations such as auto-parallelization, SIMD, loop-unrolling. By default, -Koptmsg=1 is chosen.
-Qt	Reports detailed optimization information and statistics information.
-V	Shows the version of the compiler.
-Xg (C only)	Follows language specifications of GNU C compiler. C89 specification is chosen with this option. To enable both GNU C specification and C99 specification, add -noansi option.
-Nsta (C/C++ only)	Reports statistics information.

"-Kfast,parallel" is recommended in most of the cases. Effects of optimization options depend on the programs and data. Refer to for detailed information about options. Use man fcc, man FCC or man frt to refer to these options. Please note that, at this point, this compiler does not support AVX512 operations.

Environment Variables

Major environment variables used in the execution of Fortran / C / C++ programs are as follows:

Variable	Description
`PARALLEL`	Specify the number of threads used in the execution of auto-parallelized program. Default is the number of available cores.
`OMP_NUM_THREADS`	Specify the number of threads used in the execution of OpenMP program. Default is the number of available cores.
`THREAD_STACK_SIZE`	Specify the size of stack region per thread in KB. Default is the value of `ulimit -s` . If `OMP_STACKSIZE` is also specified, the larger value is used.

Mathematical Libraries

Mathematical libraries, BLAS / LAPACK / ScaLAPACK and SSL II are available with FUJTISU compilers

BLAS / LAPACK / ScaLAPACK

Library	Version	Description
BLAS	-	Library for vector and matrix operations * All Level 3 routines and some important Level 2 routines are parallelized by threads
LAPACK	3.2.2	Library for linear algebra * Some important routines are parallelized by threads
ScaLAPACK	1.8	Library for linear algebra parallelized with MPI * Additional routines of ScaLAPACK 1.8 are added

BLAS / LAPACK / ScaLAPACK Options

Library	Parallelization	Option	Comments
BLAS	Serial	`-SSL2`
BLAS	Thread parallel	`-SSL2BLAMP`
LAPACK	Serial	`-SSL2`
LAPACK	Thread parallel	`-SSL2BLAMP`
ScaLAPACK	MPI	`-SCALAPACK`	Add `-SSL2` or `-SSL2BLAMP`, according to the BLAS/LAPACK library to use

Example 1) Use serial BLAS/LAPACK

$ frt -Kfast -SSL2 sample.f90
$ fcc -Kfast -SSL2 sample.c

Example 2) Use thread-parallel BLAS/LAPACK

$ frt -Kfast,openmp -SSL2BLAMP sample.f90
$ fcc -Kfast,openmp -SSL2BLAMP sample.c

Example 3) Use ScaLAPACK (intra node, serial)

$ mpifrt -Kfast -SCALAPACK -SSL2 sample.f90
$ mpifcc -Kfast -SCALAPACK -SSL2 sample.c

Example 4) Use ScaLAPACK (intra node, thread-parallel)

$ mpifrt -Kfast,openmp -SCALAPACK -SSL2BLAMP sample.f90
$ mpifcc -Kfast,openmp -SCALAPACK -SSL2BLAMP sample.c

SSL II（Scientific Subroutine Library II）

Library	Description
SSL II	Thread-safe serial library for numeric operations linear algebra, eigen value/eigen vector, non-linear calculation, extreme problem, supplement/approximation, conversion, numerical differential/integral, differential equation, special functions, pseudo random numbers, etc.
SSL II Multi-Thread	Supports some important functionalities that are expected to perform efficiently on SMP machines. Direct methods and iterative methods for linear algebra, matrix inversion, eigen-value problem, fourier transform, pseudo random numbers, etc.
C-SSL II	Supports a part of funcionalities of serial SSL II (for Fortran) in C programs. Thread safe.
C-SSL-II Multithread	Supports a part of funcionalities of multithreaded SSL II (for Fortran) in C programs.
SSL II/MPI	3D Fourier Transfer routine parallelized with MPI.
Fast Quadruple Precision Fundamental Arithmetics Library	Represents quadruple-precision values by double-double format, and performs fast arithmetics on them.

Options for SSL II

Library	Parallelization	Optin	Comments
SSL II C-SSL II	Serial	`-SSL2`	Can be linked with `-SSL2` or `-SSL2BLAMP`. Choose one according to the BLAS/LAPACK library to use.
SSL II C-SSL II	Multithread	`-SSL2BLAMP`
SSL II/MPI	MPI	`-SSL2MPI`	Add `-SSL2` or `-SSL2BLAMP` also.

Serial version and thread-parallel version of routines in SSL II and C-SSL II can be coexist in the same program.

Example 1) Use Serial SSL II

$ frt -Kfast -SSL2 sample.f90

Example 2) Use Thread Paralle SSL II

$ frt -Kfast,openmp -SSL2BLAMP sample.f90

Example 3) Use Serial C-SSL II

$ fcc -Kfast,openmp -SSL2 sample.c

Example 4) Use SSL II/MPI

$ mpifrt -Kfast,openmp -SSL2MPI -SSL2 sample.f90

XPFortran

Use xpfrt command to compile XPFortran programs.

$ xpfrt -Kfast sample.f90

Batch Job

On subsystems A / B of ITO, programs should be executed as Batch Jobs.

Refer to the following page for details about batch jobs:

Batch Job

Following are examples of batch job scripts for executing programs compiled by FUJITSU compilers.

Example 1) Serial Program

Number of Virtual Nodes (Number of Processes)	: 1 node (1 process)
Number of Cores per Virtual Node (Number of Threads)	: 1 core (1 thread) (Specify number of available cores of the resource group)
Maximum Execution Time	: 10 minutes
Store both standard output and standard error into the same file	: Yes
Resource Group (Queue)	: ito-single

#!/bin/bash
#PJM -L "rscunit=ito-a"
#PJM -L "rscgrp=ito-single"
#PJM -L "vnode=1"
#PJM -L "vnode-core=1"
#PJM -L "elapse=10:00"
#PJM -j
#PJM -X

./a.out


Name of subsystem (ito-a)
Name of resource group (ito-single)
Number of virtual nodes
Number of cores per virtual node (Specify number of available cores of the resource group)
Maximum execution time
Store both standard output and standard error into the same file
Inherit environmental variables at job submission to the job execution

Execute the program

Example 2) Thread Parallel Program (Auto-Parallelized)

Number of Virtual Nodes (Number of Processes)	: 1 node (1 process)
Number of Cores per Virtual Node (Number of Threads)	: 36 cores (36 threads) (Specify number of available cores of the resource group)
Maximum Execution Time	: 10 minutes
Store both standard output and standard error into the same file	: Yes
Resource Group (Queue)	: ito-ss-dbg

#!/bin/bash
#PJM -L "rscunit=ito-a"
#PJM -L "rscgrp=ito-ss-dbg"
#PJM -L "vnode=1"
#PJM -L "vnode-core=36"
#PJM -L "elapse=10:00"
#PJM -j
#PJM -X

export PARALLEL=36
./a.out


Name of subsystem (ito-a)
Name of resource group (ito-ss-dbg)
Number of virtual nodes
Number of cores per virtual node (Specify number of available cores of the resource group)
Maximum execution time
Store both standard output and standard error into the same file
Inherit environmental variables at job submission to the job execution

Number of threads used for execution
Execute the program

Example 3) Thread Parallel Program (OpenMP)

Number of Virtual Nodes (Number of Processes)	: 1 node (1 process)
Number of Cores per Virtual Node (Number of Threads)	: 36 cores (36 threads) (Specify number of available cores of the resource group)
Maximum Execution Time	: 10 minutes
Store both standard output and standard error into the same file	: Yes
Resource Group (Queue)	: ito-ss-dbg

#!/bin/bash
#PJM -L "rscunit=ito-a"
#PJM -L "rscgrp=ito-ss-dbg"
#PJM -L "vnode=1"
#PJM -L "vnode-core=36"
#PJM -L "elapse=10:00"
#PJM -j
#PJM -X

export OMP_NUM_THREADS=36
./a.out


Name of subsystem (ito-a)
Name of resource group (ito-ss-dbg)
Number of virtual nodes
Number of cores per virtual node (Specify number of available cores of the resource group)
Maximum execution time
Store both standard output and standard error into the same file
Inherit environmental variables at job submission to the job execution

Number of threads used for execution
Execute the program

Example 4) Process Parallel Program (MPI)

Number of Virtual Nodes (Number of Processes)	: 144 node (144 process) (Larger or equal to the number of available cores of the resource group)
Number of Cores per Virtual Node (Number of Threads)	: 1 cores (1 threads)
Maximum Execution Time	: 10 minutes
Store both standard output and standard error into the same file	: Yes
Resource Group (Queue)	: ito-s-dbg

#!/bin/bash
#PJM -L "rscunit=ito-a"
#PJM -L "rscgrp=ito-s-dbg"
#PJM -L "vnode=144"
#PJM -L "vnode-core=1"
#PJM -L "elapse=10:00"
#PJM -j
#PJM -X

mpiexec -n 144 ./a.out


Name of subsystem (ito-a)
Name of resource group (ito-s-dbg)
Number of virtual nodes (Larger or equal to the number of available cores of the resource group)
Number of cores per virtual node 
Maximum execution time
Store both standard output and standard error into the same file
Inherit environmental variables at job submission to the job execution

Execute the program with the number of processes

Example 5) Hybrid Parallel Program (MPI + OpenMP)

Number of Virtual Nodes (Number of Processes)	: 24 node (24 process)
Number of Cores per Virtual Node (Number of Threads)	: 6 cores (6 threads) ((Number of Virtual Nodes) * (Number of Cores) should be larger or equal to the number of available cores of the resource group)
Maximum Execution Time	: 10 minutes
Store both standard output and standard error into the same file	: Yes
Resource Group (Queue)	: ito-s-dbg

#!/bin/bash
#PJM -L "rscunit=ito-a"
#PJM -L "rscgrp=ito-s-dbg"
#PJM -L "vnode=24"
#PJM -L "vnode-core=6"
#PJM -L "elapse=10:00"
#PJM -j
#PJM -X

export OMP_NUM_THREADS=6
mpiexec -n 24 ./a.out


Name of subsystem (ito-a)
Name of resource group (ito-s-dbg)
Number of virtual nodes 
Number of cores per virtual node (vnode * vnode-core should be larger or equal to the number of available cores of the resource group)
Maximum execution time
Store both standard output and standard error into the same file
Inherit environmental variables at job submission to the job execution

Number of threads per process used for execution
Execute the program with the number of processes

Batch Job Options for MPI

Followings are options for batch jobs with Fujitsu MPI. They are followed by --mpi to specify the behavior of the program. Refer to "Job Operation Software End-user's Guide" for details.

Name	Description
--mpi proc=num	Maximum number of processes to invoke statically. Default: Number of vnode.
--mpi rank-map-bynode[=rankmap]	Map ranks in roundrobin manner, attach one node after another to ranks. (Cannot be specified with rank-map-bychip)
--mpi rank-map-bychip[:rankmap]	Map "proc / vnode-core" processes to one node, then move to the next node. (Cannot be specified with rank-map-bynode)
--mpi rank-map-hostfile=filename	Map processes to nodes according to the file specified by filename.