ホーム > Home (English) > System > Usage of ITO > FUJITSU Compiler

FUJITSU Compiler


FUJITSU Compiler is a compiler included in FUJITSU Software Technical Computing Suite (TCS). This page introduces how to compile Fortran / C / C++ programs and execute them interactively or as batch jobs.


Availability

System / Availability

  • Subsystem A
  • Staff and Students of Kyushu Univ. Staff and Students of Academic Organization Non-Academic
    OK OK OK
  • Subsystem B
  • Staff and Students of Kyushu Univ. Staff and Students of Academic Organization Non-Academic
    OK OK OK

Commands

Commands for compilation and linkage are as follows:

command [option] file [...]
Language Command Auto-Parallelization* OpenMP*
Parallel without MPI Fortran frt -Kparallel -Kopenmp
C fcc
C++ FCC
Parallel with MPI Fortran mpifrt
C mpifcc
C++ mpiFCC
Data parallel XPFortran xpfrt

* Auto-parallelization and OpenMP are disabled by default.


Compilation and Linkage

C/C++

Use fcc / FCC command for compiling and linking C/C++ programs. To compile parallel programs with MPI, use mpifcc / mpiFCC command. (Examples below show the cases with fcc and mpifcc. For compiling C++ programs, use FCC and mpiFCC, instead.

Example 1) Serial C program

$ fcc -Kfast sample.c

Example 2) C program with auto-parallelization

$ fcc -Kfast,parallel sample.c

Example 3) Thread-parallel C program with OpenMP

$ fcc -Kfast,openmp sample.c

Example 4) Process parallel C program with MPI

$ mpifcc -Kfast sample.c

Example 5) Hybrid parallel C program with OpenMP and MPI

$ mpifcc -Kfast,openmp sample.c

Fortran

Use frt command for compiling and linking Fortran programs. To compile parallel programs with MPI, use mpifrt command.

Example 1) Serial Fortran program

$ frt -Kfast sample.f90

Example 2) Fortran program with auto-parallelization

$ frt -Kfast,parallel sample.f90

Example 3) Thread parallel Fortran program with OpenMP

$ frt -Kfast,openmp sample.f90

Example 4) Process parallel Fortran program with MPI

$ mpifrt -Kfast sample.f90

Example 5) Hybrid parallel Fortran program with OpenMP and MPI

$ mpifrt -Kfast,openmp sample.f90

Frequently Used Options

Option Description
-c Creates object files only.
-o exe_file Specifies name of the executable file. "a.out" is used by default.
-O[0, 1, 2, 3] Specifies the level of optimization. Default level is set as follows:
Fortran:
If the -O option is specified without level, -O3 is set. If the -O option is not specified, -O2 is set.
C/C++:
If the -O option is specified without level, -O2 is set. If the -O option is not specified, -O0 is set.
-Kfast Applies the following optimization options. (Recommended)
Fortran:
-O3 -Keval,fp_relaxed,mfunc,ns,omitfp,sse{,fp_contract}
C/C++:
-O3 -Keval,fast_matmul,fp_relaxed,lib,mfunc,ns,omitfp,rdconv,sse{,fp_contract} -x-
-Kparallel Applies automatic parallelization. (Default: -Knoparallel)
-Kopenmp Enables OpenMP. (Default: -Knoopenmp)
-Kocl Recognizes Optimization Control Lines (OCL) (Default: -Knoocl)
-I directory Specifies directory to search for header files and module files.
-Fixed (Fortran only) Specifies that the source codes are written in fixed form.
-Free (Fortran only) Specifies that the source codes are written in free form.
-Fwide (Fortran only) Specifies that the source codes are written in fiexed form with the maximum line length of 255.
-fw (Fortran only) Outputs messages of warnings and severe errors.
-fs (Fortran only) Outputs messages of severe errors only.
-w (C/C++ only) Outputs messages of severe errors only.
-Haefosux (Fortran only) Enables detailed check at compile time and runtime, such as correctness of the arguments of subprograms, shapes and subscripts of arrays, references of non-initialized variables, and so on.
-Koptmsg[=level] Outputs messages about status and guidance of optimizations.
-Koptmsg=1: Reports on the optimizations that can cause side-effects
-Koptmsg=2: In addition to the output of optmsg=1, outputs messages about optimizations such as auto-parallelization, SIMD, loop-unrolling.
By default, -Koptmsg=1 is chosen.
-Qt Reports detailed optimization information and statistics information.
-V Shows the version of the compiler.
-Xg (C only) Follows language specifications of GNU C compiler. C89 specification is chosen with this option. To enable both GNU C specification and C99 specification, add -noansi option.
-Nsta (C/C++ only) Reports statistics information.

"-Kfast,parallel" is recommended in most of the cases. Effects of optimization options depend on the programs and data. Refer to for detailed information about options. Use man fcc, man FCC or man frt to refer to these options. Please note that, at this point, this compiler does not support AVX512 operations.


Environment Variables

Major environment variables used in the execution of Fortran / C / C++ programs are as follows:

Variable Description
PARALLEL

Specify the number of threads used in the execution of auto-parallelized program.
Default is the number of available cores.

OMP_NUM_THREADS

Specify the number of threads used in the execution of OpenMP program.
Default is the number of available cores.

THREAD_STACK_SIZE

Specify the size of stack region per thread in KB.
Default is the value of ulimit -s . If OMP_STACKSIZE is also specified, the larger value is used.

Mathematical Libraries

Mathematical libraries, BLAS / LAPACK / ScaLAPACK and SSL II are available with FUJTISU compilers

BLAS / LAPACK / ScaLAPACK

Library Version Description
BLAS - Library for vector and matrix operations
* All Level 3 routines and some important Level 2 routines are parallelized by threads
LAPACK 3.2.2 Library for linear algebra
* Some important routines are parallelized by threads
ScaLAPACK 1.8 Library for linear algebra parallelized with MPI
* Additional routines of ScaLAPACK 1.8 are added
BLAS / LAPACK / ScaLAPACK Options
Library Parallelization Option Comments
BLAS Serial -SSL2
Thread parallel -SSL2BLAMP
LAPACK Serial -SSL2
Thread parallel -SSL2BLAMP
ScaLAPACK MPI -SCALAPACK Add -SSL2 or
-SSL2BLAMP, according to the BLAS/LAPACK library to use

Example 1) Use serial BLAS/LAPACK

$ frt -Kfast -SSL2 sample.f90
$ fcc -Kfast -SSL2 sample.c

Example 2) Use thread-parallel BLAS/LAPACK

$ frt -Kfast,openmp -SSL2BLAMP sample.f90
$ fcc -Kfast,openmp -SSL2BLAMP sample.c

Example 3) Use ScaLAPACK (intra node, serial)

$ mpifrt -Kfast -SCALAPACK -SSL2 sample.f90
$ mpifcc -Kfast -SCALAPACK -SSL2 sample.c

Example 4) Use ScaLAPACK (intra node, thread-parallel)

$ mpifrt -Kfast,openmp -SCALAPACK -SSL2BLAMP sample.f90
$ mpifcc -Kfast,openmp -SCALAPACK -SSL2BLAMP sample.c

SSL II(Scientific Subroutine Library II)

Library Description
SSL II Thread-safe serial library for numeric operations
linear algebra, eigen value/eigen vector, non-linear calculation, extreme problem,
supplement/approximation, conversion, numerical differential/integral, differential equation,
special functions, pseudo random numbers, etc.
SSL II Multi-Thread Supports some important functionalities that are expected
to perform efficiently on SMP machines.
Direct methods and iterative methods for linear algebra,
matrix inversion, eigen-value problem, fourier transform, pseudo random numbers, etc.
C-SSL II Supports a part of funcionalities of serial SSL II (for Fortran) in C programs. Thread safe.
C-SSL-II Multithread Supports a part of funcionalities of multithreaded SSL II (for Fortran) in C programs.
SSL II/MPI 3D Fourier Transfer routine parallelized with MPI.
Fast Quadruple Precision Fundamental Arithmetics Library Represents quadruple-precision values by double-double format, and performs fast arithmetics on them.
Options for SSL II
Library Parallelization Optin Comments
SSL II
C-SSL II
Serial -SSL2 Can be linked with -SSL2 or -SSL2BLAMP. Choose one according to the BLAS/LAPACK library to use.
Multithread -SSL2BLAMP
SSL II/MPI MPI -SSL2MPI Add -SSL2 or -SSL2BLAMP also.

Serial version and thread-parallel version of routines in SSL II and C-SSL II can be coexist in the same program.

Example 1) Use Serial SSL II

$ frt -Kfast -SSL2 sample.f90

Example 2) Use Thread Paralle SSL II

$ frt -Kfast,openmp -SSL2BLAMP sample.f90

Example 3) Use Serial C-SSL II

$ fcc -Kfast,openmp -SSL2 sample.c

Example 4) Use SSL II/MPI

$ mpifrt -Kfast,openmp -SSL2MPI -SSL2 sample.f90

XPFortran

Use xpfrt command to compile XPFortran programs.

$ xpfrt -Kfast sample.f90

Batch Job

On subsystems A / B of ITO, programs should be executed as Batch Jobs.

Refer to the following page for details about batch jobs:

Batch Job

Following are examples of batch job scripts for executing programs compiled by FUJITSU compilers.


Example 1) Serial Program

  • Number of Virtual Nodes (Number of Processes)
  • : 1 node (1 process)
  • Number of Cores per Virtual Node (Number of Threads)
  • : 1 core (1 thread) (Specify number of available cores of the resource group)
  • Maximum Execution Time
  • : 10 minutes
  • Store both standard output and standard error into the same file
  • : Yes
  • Resource Group (Queue)
  • : ito-single
    #!/bin/bash
    #PJM -L "rscunit=ito-a"
    #PJM -L "rscgrp=ito-single"
    #PJM -L "vnode=1"
    #PJM -L "vnode-core=1"
    #PJM -L "elapse=10:00"
    #PJM -j
    #PJM -X
    
    ./a.out
    
    
    Name of subsystem (ito-a)
    Name of resource group (ito-single)
    Number of virtual nodes
    Number of cores per virtual node (Specify number of available cores of the resource group)
    Maximum execution time
    Store both standard output and standard error into the same file
    Inherit environmental variables at job submission to the job execution
    
    Execute the program
    

    Example 2) Thread Parallel Program (Auto-Parallelized)

  • Number of Virtual Nodes (Number of Processes)
  • : 1 node (1 process)
  • Number of Cores per Virtual Node (Number of Threads)
  • : 36 cores (36 threads) (Specify number of available cores of the resource group)
  • Maximum Execution Time
  • : 10 minutes
  • Store both standard output and standard error into the same file
  • : Yes
  • Resource Group (Queue)
  • : ito-ss-dbg
    #!/bin/bash
    #PJM -L "rscunit=ito-a"
    #PJM -L "rscgrp=ito-ss-dbg"
    #PJM -L "vnode=1"
    #PJM -L "vnode-core=36"
    #PJM -L "elapse=10:00"
    #PJM -j
    #PJM -X
    
    export PARALLEL=36
    ./a.out
    
    
    Name of subsystem (ito-a)
    Name of resource group (ito-ss-dbg)
    Number of virtual nodes
    Number of cores per virtual node (Specify number of available cores of the resource group)
    Maximum execution time
    Store both standard output and standard error into the same file
    Inherit environmental variables at job submission to the job execution
    
    Number of threads used for execution
    Execute the program
    

    Example 3) Thread Parallel Program (OpenMP)

  • Number of Virtual Nodes (Number of Processes)
  • : 1 node (1 process)
  • Number of Cores per Virtual Node (Number of Threads)
  • : 36 cores (36 threads) (Specify number of available cores of the resource group)
  • Maximum Execution Time
  • : 10 minutes
  • Store both standard output and standard error into the same file
  • : Yes
  • Resource Group (Queue)
  • : ito-ss-dbg
    #!/bin/bash
    #PJM -L "rscunit=ito-a"
    #PJM -L "rscgrp=ito-ss-dbg"
    #PJM -L "vnode=1"
    #PJM -L "vnode-core=36"
    #PJM -L "elapse=10:00"
    #PJM -j
    #PJM -X
    
    export OMP_NUM_THREADS=36
    ./a.out
    
    
    Name of subsystem (ito-a)
    Name of resource group (ito-ss-dbg)
    Number of virtual nodes
    Number of cores per virtual node (Specify number of available cores of the resource group)
    Maximum execution time
    Store both standard output and standard error into the same file
    Inherit environmental variables at job submission to the job execution
    
    Number of threads used for execution
    Execute the program
    

    Example 4) Process Parallel Program (MPI)

  • Number of Virtual Nodes (Number of Processes)
  • : 144 node (144 process) (Larger or equal to the number of available cores of the resource group)
  • Number of Cores per Virtual Node (Number of Threads)
  • : 1 cores (1 threads)
  • Maximum Execution Time
  • : 10 minutes
  • Store both standard output and standard error into the same file
  • : Yes
  • Resource Group (Queue)
  • : ito-s-dbg
    #!/bin/bash
    #PJM -L "rscunit=ito-a"
    #PJM -L "rscgrp=ito-s-dbg"
    #PJM -L "vnode=144"
    #PJM -L "vnode-core=1"
    #PJM -L "elapse=10:00"
    #PJM -j
    #PJM -X
    
    mpiexec -n 144 ./a.out
    
    
    Name of subsystem (ito-a)
    Name of resource group (ito-s-dbg)
    Number of virtual nodes (Larger or equal to the number of available cores of the resource group)
    Number of cores per virtual node 
    Maximum execution time
    Store both standard output and standard error into the same file
    Inherit environmental variables at job submission to the job execution
    
    Execute the program with the number of processes 
    

    Example 5) Hybrid Parallel Program (MPI + OpenMP)

  • Number of Virtual Nodes (Number of Processes)
  • : 24 node (24 process)
  • Number of Cores per Virtual Node (Number of Threads)
  • : 6 cores (6 threads) ((Number of Virtual Nodes) * (Number of Cores) should be larger or equal to the number of available cores of the resource group)
  • Maximum Execution Time
  • : 10 minutes
  • Store both standard output and standard error into the same file
  • : Yes
  • Resource Group (Queue)
  • : ito-s-dbg
    #!/bin/bash
    #PJM -L "rscunit=ito-a"
    #PJM -L "rscgrp=ito-s-dbg"
    #PJM -L "vnode=24"
    #PJM -L "vnode-core=6"
    #PJM -L "elapse=10:00"
    #PJM -j
    #PJM -X
    
    export OMP_NUM_THREADS=6
    mpiexec -n 24 ./a.out
    
    
    Name of subsystem (ito-a)
    Name of resource group (ito-s-dbg)
    Number of virtual nodes 
    Number of cores per virtual node (vnode * vnode-core should be larger or equal to the number of available cores of the resource group)
    Maximum execution time
    Store both standard output and standard error into the same file
    Inherit environmental variables at job submission to the job execution
    
    Number of threads per process used for execution
    Execute the program with the number of processes 
    

    Batch Job Options for MPI

    Followings are options for batch jobs with Fujitsu MPI. They are followed by --mpi to specify the behavior of the program. Refer to "Job Operation Software End-user's Guide" for details.

    Name Description
    --mpi proc=num Maximum number of processes to invoke statically.
    Default: Number of vnode.
    --mpi rank-map-bynode[=rankmap] Map ranks in roundrobin manner, attach one node after another to ranks.
    (Cannot be specified with rank-map-bychip)
    --mpi rank-map-bychip[:rankmap] Map "proc / vnode-core" processes to one node, then move to the next node.
    (Cannot be specified with rank-map-bynode)
    --mpi rank-map-hostfile=filename Map processes to nodes according to the file specified by filename.