Wednesday 29 April 2015

Build FEniCS 1.5.0 on ARCHER

New build of DOLFIN-1.5.0 on Cray XC30

First steps

After a few people have asked me how to do it, I’m providing a complete run-down of building DOLFIN on Cray XC30. It’s not really that difficult.
The first point is to make sure you use gcc. It is probably possible to build with Intel, but the Cray C++ compiler is a non-starter here.

module swap PrgEnv-cray PrgEnv-gnu

Making modules for swig, cmake etc.

For a completely clean build, I’m going to download and install a few dependencies, which are often not found on HPC machines, or if they are, are out of date and installed in the wrong place.
Usually, I like to keep my source files in one folder, say src, and install into another folder, say packages, and then define modules in another folder, modules. Below, I will use ... to indicate some path or other, yours will be different.

For example, let’s install pcre (needed by swig):

cd src
wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.37.tar.bz2
tar xf pcre-8.37.tar.bz2
cd pcre-8.37
./configure --prefix=/work/.../packages/pcre-8.37
make
make install

I’ll also make a module file in /work/.../modules/pcre/8.37 like this:

#%Module -*- tcl -*-
##
## modulefile
##
proc ModulesHelp { } {

  puts stderr "\tAdds pcre 8.37 to your environment.\n"
}

module-whatis "adds pcre 8.37 to your environment"

set               root                 /work/.../packages/pcre-8.37
prepend-path      PATH                 $root/bin
prepend-path      CPATH                $root/include
prepend-path      LIBRARY_PATH         $root/lib
prepend-path      LD_LIBRARY_PATH      $root/lib
prepend-path      MANPATH              $root/share/man

After that, I can just do

module use /work/.../modules
module load pcre/8.37

and pcre will be on my PATH.

I am just going to repeat that process for swig, cmake, boost and eigen, which are all
essentials for building and running DOLFIN. Mostly these are easy to build and install, using cmake or configure, but I’ll just pause briefly on boost, as it can be a bit more painful.

Building boost for DOLFIN

Usually, ./bootstrap.sh works fine, and creates project-config.jam for gcc, correctly.
However, boost will take forever to compile if we use this, so it is a good idea to limit the number of libraries. I usually edit project-config.jam until it looks like this:

# Boost.Build Configuration
# Automatically generated by bootstrap.sh

import option ;
import feature ;

# Compiler configuration. This definition will be used unless
# you already have defined some toolsets in your user-config.jam
# file.
if ! gcc in [ feature.values <toolset> ]
{
    using gcc : 4.9 : : <compileflags>-std=c++11 ; 
}

project : default-build <toolset>gcc ;

# List of --with-<library> and --without-<library>
# options. If left empty, all libraries will be built.
# Options specified on the command line completely
# override this variable.
libraries = --with-filesystem --with-program_options --with-timer --with-chrono --with-system --with-thread --with-iostreams --with-serialization ;

# These settings are equivivalent to corresponding command-line
# options.
option.set prefix : /work/.../packages/boost-1.55.0 ;
option.set exec-prefix : /work/.../packages/boost-1.55.0 ;
option.set libdir : /work/.../packages/boost-1.55.0/lib ;
option.set includedir : /work/.../packages/boost-1.55.0/include ;

# Stop on first error
option.set keep-going : false ;

Now you can do ./b2 and ./b2 install and it should only take a few minutes(!)

Python packages

So much for preliminaries. Now let’s install the python packages. I tend to just lump these together and install them in one directory, let’s say fenics-1.5.0. Repeat for ffc, fiat, instant and ufl. Other dependencies include sympy and plex, so they can be installed in the same way.

cd ufl-1.5.0
python setup.py install --prefix=/work/.../packages/fenics-1.5.0

etc. etc. etc. and create a module file to set the PATH, PYTHONPATH etc. for them

Ready to build DOLFIN

> module avail 

--------------------------- /work/.../modules/ ---------------------------
boost/1.57.0 cmake/3.2.2  eigen/3.2.4  fenics/1.5.0 pcre/8.37    swig/3.0.5

Now I will load all these modules, and try to build DOLFIN.

cd src
tar xf dolfin-1.5.0.tar.bz2
cd dolfin-1.5.0
mkdir build
cd build
cmake ..
make

That works! However, DOLFIN can be built with various optional packages.

Some are more optional than others. On a HPC system, we need some quality scalable solvers, which are provided by PETSc. Without PETSc, dolfin on HPC doesn’t make much sense.
PETSc is available as a system package on Cray. PARMETIS and SCOTCH are also really useful, as is HDF5. They are all available from Cray:

module load cray-petsc/3.5.3.0
module load cray-tpsl/1.4.4
export SCOTCH_DIR=$CRAY_TPSL_PREFIX_DIR
export PARMETIS_DIR=$CRAY_TPSL_PREFIX_DIR
module load cray-hdf5-parallel/1.8.13

The dolfin build with cmake will try to test these libraries, but they will fail on the login nodes (because of MPI linking). So it is necessary to add some extra flags to cmake:

cmake -DDOLFIN_SKIP_BUILD_TESTS=true -DDOLFIN_AUTO_DETECT_MPI=false -DCMAKE_INSTALL_PREFIX=/work/.../packages/dolfin-1.5.0 ..

make

make install

I also make a module for dolfin, which I can load with the module command.
Now, let’s try it:

xxxxx@eslogin006:~> python
Python 2.7.6 (default, Mar 10 2014, 14:13:45) 
[GCC 4.8.1 20130531 (Cray Inc.)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from dolfin import *
>>> mesh = UnitSquareMesh(2,2)
[Wed Apr 29 14:05:32 2015] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(547): 
MPID_Init(203).......: channel initialization failed
MPID_Init(579).......:  PMI2 init failed: 1 
Aborted

Well, this is just normal behaviour on the login nodes. Any attempt to use MPI will cause a crash.
A typical batch script might look like this:

#PBS -l select=1
#PBS -N example.py
#PBS -l walltime=0:5:0

# Switch to current working directory
cd $PBS_O_WORKDIR

module use /work/.../modules
module load fenics/1.5.0
module load dolfin/1.5.0

cd /work/.../example

# Run the parallel program
aprun -n 12 -N 12 -S 6 -d 1 python example.py

and that does work!

Written with StackEdit.

No comments:

Post a Comment