Thursday, 30 October 2014

Installing GCC 4.9.1 from source on Fuji

To be installed: GMP 6.0.0, MPFR 3.1.2, MPC 1.0.2, GCC 4.9.1

The source codes can be obtained from one of the mirrors.
Example:
ftp://gcc.gnu.org/pub/gcc/infrastructure/

GMP is needed by MPFR, which are needed by MPC, which are needed by GCC.

GMP

$ tar -xzvf gmp-6.0.0a.tar.bz2

$ rsync -avr /apps/GNU/GMP/6.0.0/ gmp-6.0.0/*

$ cd /apps/GNU/GMP/6.0.0

$ ./configure --disable-shared --enable-static --prefix=/apps/GNU/GMP/6.0.0

$ make && make check && make install

MPFR

$ tar -xzvf mpfr-3.1.2.tar.gz

$ rsync -avr /apps/GNU/MPFR/3.1.2-new/ mpfr-3.1.2/*

$ cd /apps/GNU/MPFR/3.1.2-new

$ ./configure --disable-shared --enable-static --prefix=/apps/GNU/MPFR/3.1.2-new --with-gmp=/apps/GNU/GMP/6.0.0 

$ make && make check && make install

MPC

$ tar -xzvf mpc-1.0.2.tar.gz

$ rsync -avr /apps/GNU/MPC/1.0.2/ mpc-1.0.2/*

$ cd /apps/GNU/MPC/1.0.2

$ ./configure --disable-shared --enable-static --prefix=/apps/GNU/MPC/1.0.2 --with-gmp=/apps/GNU/GMP/6.0.0 --with-mpfr=/apps/GNU/MPFR/3.1.2-new

$ make && make check && make install

GCC

$ tar -xzvf gcc-4.9.1.tar.gz

$ cd /apps/GNU/GCC/4.9.1

$ <path-to-gcc-source>/gcc-4.9.1/configure --with-gmp=/apps/GNU/GMP/6.0.0 --with-mpfr=/apps/GNU/MPFR/3.1.2-new --with-mpc=/apps/GNU/MPC/1.0.2 --disable-multilib

$ make #This will take a long time

$ make install

After successful build, there is one important message:

Libraries have been installed in:
   /apps/GNU/GCC/4.9.1/lib/../lib64

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the `LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the `LD_RUN_PATH' environment variable
     during linking
   - use the `-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

Wednesday, 29 October 2014

OpenFOAM 2.3.0 in Fuji

System: Fuji (Upgraded - CentOS 6.5)
Note: I used OpenMPI here for this test installation. Default MPI in Fuji is Intel MPI.

Download:
OpenFOAM-2.3.0.tgz
ThirdParty-2.3.0.tgz

Setting up GCC 4.9.1

GCC 4.9.1 is available from /apps/GNU/GCC/4.9.1

$ export PATH=/apps/GNU/GCC/4.9.1/bin:$PATH

$ export LD_LIBRARY_PATH=/apps/GNU/GCC/4.9.1/lib64:/apps/GNU/GCC/4.9.1/lib:/apps/GNU/MPC/1.0.2/lib:/apps/GNU/GMP/6.0.0/lib:/apps/GNU/MPFR/3.1.2/lib:$LD_LIBRARY_PATH

Setting up OpenFOAM Installation

$ mkdir ~/scratch/OpenFOAM

Download the OpenFOAM and ThirdParty into this directory.

$ tar -xzvf OpenFOAM-2.3.0.tgz
$ tar -xzvf ThirdParty-2.3.0.tgz

$ export FOAM_INST_DIR=~/scratch/OpenFOAM
$ foamDotFile=$FOAM_INST_DIR/OpenFOAM-2.3.0/etc/bashrc
$ [ -f $foamDotFile ] && . $foamDotFile


$ cd $FOAM_INST_DIR
$ mkdir obj
$ cd obj
$ ../OpenFOAM-2.3.0/Allwmake

Testing

Notice that all the executables, e.g. icoFoam, are installed on $FOAM_INST_DIR/bin.


In the remote machine, set the .bashrc to include:

export FOAM_INST_DIR=$HOME/scratch/OpenFOAM
source $FOAM_INST_DIR/OpenFOAM-2.3.0/etc/bashrc



To run a test parallel OpenFOAM on 2 nodes:

$ mkdir -p $FOAM_RUN

$ cp -r $FOAM_TUTORIALS $FOAM_RUN

$ cd $FOAM_RUN/tutorials/incompressible/icoFoam

$ cp -r cavity cavityParallel

Copy $WM_PROJECT_DIR/applications/utilities/parallelProcessing/decomposePar/decomposeParDict to cavityParallel/system

Edit the decomposeParDict:
numberOfSubdomains  2;
method simple;

$ cd cavityParallel

$ blockMesh

$ cd ..

$ decomposePar -case cavityParallel

$ cd cavityParallel

$ echo -e 'fuji381\nfuji382' > hosts

$ mpirun -f hosts -np 2 icoFoam -parallel

Tuesday, 28 October 2014

Installing OFED on Linux (CentOS 6.5)

As the title suggests, I will show how to install OFED stack on CentOS 6.5.

Prerequisites

kernel-devel
rpm-build
libtool
gcc-c++
bison
flex
glib2-devel
glib2
tcl-devel
zlib-devel

Tips:
  • To prevent build error, make sure your gcc version is your kernel's latest.
  • It's recommended to use the latest kernel from the repo.

Download OFED software from https://www.openfabrics.org/index.php

Extract and run install.pl (--help to see options)

After installation, do a reboot

Some points:

Typically, locked memory limit has to be set to unlimited to be able to run HPC MPI jobs across nodes. 

Add the following to /etc/security/limits.conf:

* soft memlock unlimited
* hard memlock unlimited

Exit the shell and you should have:
$ ulimit -l
unlimited

Monday, 27 October 2014

Intel MPI How To Use and Debug

Running /bin/hostname

MPIRUN directory:
/opt/intel/impi/<version>/intel64/bin

Source mpivars.sh

Create a machinefile:
$ cat mach.txt
node1
node2

Test run:
$ mpirun -r ssh -f mach.txt -ppn 1 -np 2 ./bin/hostname

mpirun is a utility which runs mpdboot after that mpiexec. So, options for mpdboot comes first and after that options for mpiexec. '-machinefile' is an option for mpiexec.
With mpirun, there is actually no need to run mpdboot (needed only for mpiexec) nor creating mpd.hosts.

If instead you would like to use mpiexec, you would have to do the following.

Create mpd.hosts on your working directory.

Example is
$ cat mpd.hosts
node1
node2

Start mpdring:
$ mpdboot

Try another one: cpi.c
$ mpiicpc mpi.c
$ mpirun -f mach.txt -ppn 4 -np 8 ./a.out

Debugging

Note: Sometimes iptables might prevent mpi across nodes. You might want to flush or edit iptables.

Debugging:

#Pass DEBUG environment variables
export I_MPI_DEBUG=5

#Check mpd is up
$ mpdtrace

#To specifically use IB HCA port 2 instead of default port 1
export I_MPI_DAPL_PROVIDER=ofa-v2-mlx4_0-2

Note: DAPL versions of the nodes must match. Older versions of Intel MPI do not support DAPL v2.0. When installing the OS, make sure the necessary Infiniband drivers (e.g. DAPL 1.2 if using old Intel MPI) are installed.

Monday, 13 October 2014

Configuring Internet Connection For Compute Nodes Of A Cluster

Idea: In a cluster, usually only the head node has outgoing internet connection. This post details how to set up the compute nodes to have outgoing internet connection using the head node as the router.

Assuming enp129s0f0 is for internal network and enp129s0f1 for external network.

1. Tell kernel to allow ip forwarding:

On the head node 
$ echo 1 > /proc/sys/net/ipv4/ip_forward

2. Configure IPTABLES to forward packets from internal network.

On the head node
$ sudo iptables -t nat -A POSTROUTING -o enp129s0f1 -j MASQUERADE
$ sudo iptables -A FORWARD -i enp129s0f0 -o enp129s0f1 -j ACCEPT
$ sudo iptables -A FORWARD -i enp129s0f1 -o enp129s0f0 -m state --state RELATED,ESTABLISHED -j ACCEPT
$ sudo iptables -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT


Sunday, 12 October 2014

Debugging SSH without-password

Sometimes, we want to set up passwordless login across nodes. In this post, I will use the root user as an example.

Common Practice

Easiest way:
$ ssh-copy-id remotehostname

Sometimes, we may encounter
/usr/bin/ssh-copy-id: ERROR: No identities found

In that case, do:
$ ssh-copy-id -i ~/.ssh/id_rsa.pub remotehostname

If known_hosts has offending key
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
   .
   .
   .
Offending key in /root/.ssh/known_hosts: 6
   .
   .

Thursday, 9 October 2014

Configuring a Compute Node (of a Supercomputer)

This post details how a compute node is set up after an upgrade (e.g. OS update).
This instruction is based on the real setup of one HPC system that I administered.

Migrate User Accounts

Write a script to transfer user accounts. This step is for creating user home directories cleanly (optional).
Require: /etc/shadow and /etc/passwd from login node.

Next, copy over the /etc/passwd, /etc/shadow, /etc/group, and /etc/hosts from master node. Remember to make copies of the existing files on the compute node.

Transfer user home directory to /home01
$ cd /
$ ln -s home home01

Set up internet

$ echo 'GATEWAY=10.10.8.243' >> /etc/sysconfig/network

$ echo -e 'nameserver 10.10.8.236\nnameserver 202.83.248.3\nnameserver 123.136.66.68' >> /etc/resolv.conf

$ service network restart

Flush iptables and make a copy of the existing one. Comment up the rules in /etc/sysconfig/iptables.

Wednesday, 8 October 2014

Testing Intel OFED Installation.

Intel TrueScale QDR Qlogic with Intel OFED.


First, you have to install OFED from Intel.

Testing OpenMPI (with Intel compilers).

cd /usr/mpi/intel/openmpi-<version>-qlc/tests/osu_benchmarks

source /usr/mpi/intel/openmpi-<version>-qlc/bin/mpivars.sh

mpirun -host node1,node2 -np 2 ./osu_latency
#0-byte message should be less than 2 us.

mpirun -host node1,node2 -np 2 ./osu_bw
#Larger messages should hit above 3000MB/s.

Note: When running MPI apps, do not use InfiniBand Verbs (IBV), OpenIB, DAPL, etc. Use PSM instead.
If latency for 0-byte message is 5 us, then Verbs interface is used instead of PSM.
If this happens, try exit the ssh session and log in again.