CADET-Core BLAS linking in conda

Motivation

@j.schmoelder and me were wondering if CADET-Core installed from conda is using openblas or intel MKL blas and what the performance difference might be. So I checked.

Background

CADET-Core compiled for conda links it’s dependencies dynamically. We can therefore easily switch which BLAS library is used by installing a different libblas library in our conda-env through conda.

The default on windows is MKL, the default on WSL Ubuntu is openblas. Which one you have can be checked with conda list libblas.

A different version can be installed with mamba install libblas=*=*mkl

The syntax of =*=* is:

  • The first =* indicates that you are willing to accept any version of the libblas package.
  • The second =*mkl indicates that you want to install any build of libblas that is built with the mkl (Intel Math Kernel Library) variant.

(thanks ChatGPT)

Performance

Simulation duration for a 3 component LWE with tight tolerances (abstol = 1e-12, algtol = 1e-12, reltol = 1e-8):

OS CADET source BLAS version TBB time [s]
Linux (WSL Ubuntu) conda MKL 23_linux64_mkl yes 20.29
Linux (WSL Ubuntu) conda openblas 21_linux64_openblas yes 30.51
Linux (WSL Ubuntu) conda BLIS 21_linux64_blis yes 44.52
Windows self-compiled MKL oneAPI 2022.1.0 no 21.00
Windows conda MKL 20_win64_mkl yes 22.14
Windows conda MKL 23_win64_mkl yes 22.95
Windows conda openblas 23_win64_openblas yes 40.99
Windows conda BLIS 23_win64_blis yes 48.77

The bad result of blis is suprising, as the blis team reports better performance compared to MKL on Zen 1 and Zen 2 architecture. Maybe my Zen 3 CPU works well with MKL again, maybe intel TBB and blis don’t get along, maybe it’s something else. ¯_(ツ)_/¯

Key takeaways

Conda can be used to select the BLAS backend. MKL BLAS is significantly faster on both Linux and Windows than OpenBlas and BLIS.

Does this have an impact on speed of simulations or fitting? Or is it just installation?

Those seconds times are the duration of a simulation. So this will also have an effect on fitting & optimization runs.