We’ve noticed that it seems like the solver.nthreads variable isn’t actually changing the number of threads in v5.0. We saw this in CPU monitoring while running simulations in v4.4 vs v5.0.
Is there a change there that we should be aware of?
While trying to figure out why our sims were running unusually slowly we ran some comparisons of a GRM + binding model simulation looking at the two cadet versions, binding models, and whether the use_analytic_jacobian variable was set to true or not. In all cases solver.nthreads = 4, though we were not seeing 4 cores being used in v5.0 while we did when running v4.4. Here are the results of that, mostly just for fun since we know what’s happening now. Each row is based on 5 replicates of the given condition.
version
binding model
use_analytic_jacobian
Average Time
Minimum Time
4.4
Langmuir
0
1.838
1.828
4.4
Langmuir
1
1.341
1.328
5.0
Langmuir
0
4.736
4.721
5.0
Langmuir
1
3.720
3.719
version
binding model
use_analytic_jacobian
Average Time
Minimum Time
4.4
colloidal
0
24.561
24.504
4.4
colloidal
1
16.473
16.424
5.0
colloidal
0
74.726
74.471
5.0
colloidal
1
61.017
60.891
As a note, no comparison can be made between the Langmuir and colloidal isotherms, these are different chromatograms and the colloidal model included an elution step with [H+] dependence here while the Langmuir model had no dependence.
Anyway, TLDR it seems like solver.nthreads isn’t working.
we set the default for ENABLE_THREADING to OFF in the newest release to improve the speed of single-threaded applications (as we got reports that CADET-Core was slower in single-threaded mode if compiled with multi-threading support) and forgot to document the change . Sorry about that! You can compile CADET yourself with -DENABLE_THREADING=ON to create a version with multithreading or wait for us to get an alternative conda-forge release with threading enabled. But that will probably happen after the workshop next week.
now with the workshop over we can focus on other topics again. Do you need a conda-installable release of CADET-Core with multithreading enabled or are you fine with compiling it yourself?
multi-threading in CADET-Core is implemented as compile-time option, i.e. we are essentially talking about two different codes. The multi-thread code performance suffers from some overhead introduced by the multithreading infrastructure. We have not yet investigated this overhead further but know that its a common problem with multithreading libraries.
We plan to provide both versions in the near future.
We did some benchmarks on the performance difference, but I couldnt find them quickly, maybe @ronald.jaepel can help me out ?
I just re-ran some tests to get an up-to-date estimate of the differences. I’ve found that with multi-threading support the simulation time difference is
~ 9% longer on short simulations (0.129171 s vs 0.142467 s),
~ 3% longer on medium simulations (2.86331 s vs 2.9531 s),
~ 1.5% longer on long simulations (47.3125 s vs 48.0605 s)
The times are best of 10 repetitions, running single core on Ubuntu WSL, compiled for CADET-Core v5.0.1 using default arguments except for -DENABLE_THREADING=ON -DBLA_VENDOR=Intel10_64lp vs -DENABLE_THREADING=OFF -DBLA_VENDOR=Intel10_64lp_seq. Simulation setups were generated with the createLWE function. Short was --col 10 --par 5, medium was --col 100 --par 10, long was --col 1000 --par 10.
Thanks for summarizing this. We also had other cases, where the differences where much higher, I remember something like 100 % slower for multi-unit-operation systems. But I don’t currently have them at hand, so take this with a grain of salt. Once we have fixed some more pressing issues, we can revisit this.
I don’t see any reason why it wouldn’t be finding cadet.hpp, the path seems to be correct and it seems to exist in the include folder. I am a bit helpless as I have no idea about this kind of thing, so I would appreciate any help. Otherwise, is there a release with multithreading enabled yet?
On the other hand, even if there is, since in my case it’s probably better if I build from source anyway to avoid having to run x86-64 code through Rosetta, I’d like to figure out how to do this.
I get the same error with or without the -DBLA_VENDOR=Accelerate flag.
This is the full output from make:
[ 4%] Built target sundials_nvecserial_static
[ 12%] Built target sundials_sunlinsolspgmr_static
[ 20%] Built target sundials_sunlinsolspfgmr_static
[ 28%] Built target sundials_sunlinsolspbcgs_static
[ 36%] Built target sundials_sunlinsolsptfqmr_static
[ 66%] Built target sundials_idas_static
[ 69%] Built target templateCodeGen
[ 70%] Building CXX object src/cadet-cli/CMakeFiles/cadet-cli.dir/cadet-cli.cpp.o
/Users/angelamoser/Projects/CADET/src/cadet-cli/cadet-cli.cpp:13:10: fatal error: 'cadet/cadet.hpp' file not found
13 | #include "cadet/cadet.hpp"
| ^~~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [src/cadet-cli/CMakeFiles/cadet-cli.dir/cadet-cli.cpp.o] Error 1
make[1]: *** [src/cadet-cli/CMakeFiles/cadet-cli.dir/all] Error 2
make: *** [all] Error 2
Do you also need the output from cmake?
I don’t really need the multithreading for Windows or Linux because basically I only run fitting on our Windows computer in the office, so running in parallel is ideal. I just write all my scripts and run individual forward simulations on my Mac. Since that’s all I am using it for, it’s not a huge deal, it’s just nicer when it’s faster.