JDFTx  1.7.0
Compiling on supercomputers

Compiling on NERSC Perlmutter

Compiling JDFTx to use the NVIDIA A100 GPUs on NERSC Perlmutter requires specific combinations of gcc and cuda to work correctly. Additional flags below are required to link non-threaded Cray libsci and avoid a crash within libsci during cleanup. The following commands may be used to invoke cmake (assuming bash shell):

module load PrgEnv-gnu
module load cmake cray-fftw
module load cudatoolkit/21.3_11.2
module load gcc/9.3.0
export CRAY_ACCEL_TARGET=nvidia80  # needed for CUDA-aware MPI support

CC=cc CXX=CC cmake \
        -D EnableProfiling=yes \
        -D FFTW3_PATH=${FFTW_ROOT} \
        -D GSL_PATH=${HOME}/Code/perlmutter/GSL \
        -D CBLAS_LIBRARY=${HOME}/Code/perlmutter/GSL/lib/libgslcblas.so \
        -D EnableCUDA=yes \
        -D EnableCuSolver=yes \
        -D CudaAwareMPI=yes \
        -D PinnedHostMemory=yes \
        -D CUDA_NVCC_FLAGS="-Wno-deprecated-gpu-targets -allow-unsupported-compiler -std=c++14" \
        -D CUDA_ARCH=compute_80 \
        -D CUDA_CODE=sm_80 \
        -D EXTRA_CXX_FLAGS="-DDONT_FINALIZE_MPI -L/opt/cray/pe/lib64 -lsci_gnu_82" \

If you put the above in a script file, make sure to source it (rather than run it) because the CRAY_ACCEL_TARGET export needs to be set during make as well. Note that the GSL module on Perlmutter (at the time of writing this) is not compatible with the compiler versions needed for cuda. Consequently, you need to compile the GNU Scientific Library within some path inidicated as ${GSL_ROOT} above. Fetch the latest GSL source code, extract it and within the source directory:

#Compile GSL with same compiler versions as needed for JDFTx
module load PrgEnv-gnu
module load gcc/9.3.0
./configure CC=cc CXX=CC FC=ftn --enable-shared --prefix=${GSL_ROOT}
make -j32
make install

Do this first and then use the same ${GSL_ROOT} in the jdftx cmake command above.

When running jobs on perlmutter, make sure the same modules are loaded (four module load lines in the compile script above) before launching a job. Here's an example job script using two nodes with four GPUs each:

#SBATCH -A account_g      # replace with valid account
#SBATCH -t 10             # replace with suitable time limit
#SBATCH -C gpu
#SBATCH -q regular
#SBATCH -n 8
#SBATCH --ntasks-per-node=4
#SBATCH -c 32
#SBATCH --gpus-per-task=1

export SLURM_CPU_BIND="cores"
export JDFTX_MEMPOOL_SIZE=8192      # adjust as needed (in MB)
export MPICH_GPU_SUPPORT_ENABLED=1  # needed for CUDA-aware MPI support

srun /path/to/jdftx_gpu -i inputfile.in

Please report any issues with building or running JDFTx on Perlmutter. In particular, the module versions may need some changes after each Cray update. Hopefully, some of the additional tweaks made above should become unnecessary as the Cray libraries stabilize.

Compiling on NERSC Edison/Cori

Use the gnu compiler and MKL to compile JDFTx on NERSC Edison/Cori. The following commands may be used to invoke cmake (assuming bash shell):

#Select the right compiler and load necessary modules
module swap PrgEnv-intel PrgEnv-gnu
module load gcc cmake gsl cray-fftw
module unload darshan
export CRAYPE_LINK_TYPE="dynamic"

#From inside your build directory
#(assuming relative paths as in the generic instructions above)
CC="cc -dynamic -lmpich" CXX="CC -dynamic -lmpich" cmake \
    -D EnableProfiling=yes \
    -D EnableMKL=yes \
    -D ForceFFTW=yes \
    -D ThreadedBLAS=no \
    -D GSL_PATH=${GSL_DIR} \
make -j12

The optional ThreadedBLAS=no line above uses single-threaded MKL with threads managed by JDFTx instead. This slightly reduces performance (around 5%) compared to using MKL threads, but MKL threads frequently lead to a crash when trying to create pthreads elsewhere in JDFTx on NERSC.

Compiling on TACC

Use the GNU compilers and MKL for the easiest compilation on TACC Stampede. The following commands may be used to invoke cmake (assuming bash shell):

#Select gcc as the compiler:
module load gcc/4.7.1
module load mkl gsl cuda cmake fftw3

CC=gcc CXX=g++ cmake \
   -D EnableCUDA=yes \
   -D EnableMKL=yes \
   -D ForceFFTW=yes \

make -j12

Make on the login nodes (as shown above) or on the gpu queue if you loaded cuda; it should work on any machine otherwise.