CADET-MATCH script for PE of mobile-phase modulator isotherm

Hi William,
I am familiarizing myself with the CADET-MATCH and working on a CADET-MATCH script for mobile-phase modulator isotherm for a salt based elution for a HIC experiment. I am using CADET-Tutorial Exercise 08 CADET-Match Introduction solution as a template and editing it for mobile phase modulator instead of Langmuir. I have also edited utils.py file (used in tutorial) accordingly.
When run the following section –

from CADETMatch.jupyter import Match
match_file = base_dir / 'mobile_phase_modulator.json'
with open(match_file, 'w') as json_file:
   json.dump(match_config.to_dict(), json_file, indent='\t')
match = Match(match_file)
match.start_sim()

It executes only for 1 min 47.8 s and process shuts down. Could you help me in identifying and fixing the error in the script. Many thanks for your help in advance! Copied below is the run log of the command:

2021-01-22 16:14:06,868 match.py print_version 113 CADETMatch starting up version: 0.6.17
2021-01-22 16:14:06,907 match.py print_version 138 joblib version: 0.16.0 tested with 0.15.1
2021-01-22 16:14:06,912 match.py print_version 138 addict version: 2.2.1 tested with 2.2.1
2021-01-22 16:14:06,918 match.py print_version 138 corner version: 0.0.0 tested with 2.0.1
2021-01-22 16:14:06,924 match.py print_version 138 emcee version: 0.0.0 tested with 3.0.2
2021-01-22 16:14:06,930 match.py print_version 138 SALib version: 0.0+nnone tested with 0.0+nnone
2021-01-22 16:14:06,935 match.py print_version 138 deap version: 1.3.1 tested with 1.3.1
2021-01-22 16:14:06,943 match.py print_version 138 psutil version: 5.7.0 tested with 5.7.0
2021-01-22 16:14:06,948 match.py print_version 138 numpy version: 1.18.5 tested with 1.18.5
2021-01-22 16:14:06,953 match.py print_version 138 openpyxl version: 3.0.4 tested with 3.0.3
2021-01-22 16:14:06,960 match.py print_version 138 scipy version: 1.5.0 tested with 1.5.0
2021-01-22 16:14:06,966 match.py print_version 138 matplotlib version: 3.2.2 tested with 3.2.1
2021-01-22 16:14:06,971 match.py print_version 138 pandas version: 1.0.5 tested with 1.0.5
2021-01-22 16:14:06,977 match.py print_version 138 h5py version: 2.10.0 tested with 2.10.0
2021-01-22 16:14:06,983 match.py print_version 138 cadet version: 0.6 tested with 0.6
2021-01-22 16:14:06,989 match.py print_version 138 seaborn version: 0.10.1 tested with 0.10.1
2021-01-22 16:14:07,015 match.py print_version 138 scikit-learn version: 0.23.1 tested with 0.23.1
2021-01-22 16:14:07,021 match.py print_version 138 jstyleson version: 0.0.2 tested with 0.0.2
2021-01-22 16:14:08,146 auto_keq.py getHeaders 162 parameter ['/input/model/unit_001/adsorption/MPM_KA', '/input/model/unit_001/adsorption/MPM_KD'] log ka log keq
2021-01-22 16:14:08,146 auto.py getHeaders 159 parameter /input/model/unit_001/adsorption/MPM_QMAX linear
2021-01-22 16:14:08,146 auto.py getHeaders 159 parameter /input/model/unit_001/adsorption/MPM_QMAX linear
2021-01-22 16:14:08,759 util.py setupSimulation 1166 mobile_phase_modulator_reference.h5 abstol=1e-06  reltol=1e-06
2021-01-22 16:14:11,283 match.py setupTemplates 253 simulation took 2.5222630500793457
2021-01-22 16:14:13,590 match.py setupTemplates 289 simulation final took 2.2599635124206543
2021-01-22 16:14:16,266 gradFD.py create_template 38 grad simulation took 2.56714129447937
2021-01-22 16:14:16,267 gradFD.py create_template 40 grad C:/Users/kusum.solanki/CADET-Tutorial/08_CADET-Match_Introduction/results_exercise/misc/template_main_grad.h5 abstol=1e-06  reltol=1e-06
2021-01-22 16:15:46,732 loggerwriter.py write 10 Traceback (most recent call last):
2021-01-22 16:15:46,732 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 496, in _list_to_arrays
2021-01-22 16:15:46,735 loggerwriter.py write 10     
2021-01-22 16:15:46,735 loggerwriter.py write 10 result = _convert_object_array(
2021-01-22 16:15:46,736 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 580, in _convert_object_array
2021-01-22 16:15:46,737 loggerwriter.py write 10     
2021-01-22 16:15:46,737 loggerwriter.py write 10 raise AssertionError(
2021-01-22 16:15:46,738 loggerwriter.py write 10 AssertionError
2021-01-22 16:15:46,738 loggerwriter.py write 10 : 
2021-01-22 16:15:46,738 loggerwriter.py write 10 14 columns passed, passed data had 13 columns
2021-01-22 16:15:46,739 loggerwriter.py write 10 
The above exception was the direct cause of the following exception:
2021-01-22 16:15:46,739 loggerwriter.py write 10 Traceback (most recent call last):
2021-01-22 16:15:46,739 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\match.py", line 322, in <module>
2021-01-22 16:15:46,741 loggerwriter.py write 10     
2021-01-22 16:15:46,742 loggerwriter.py write 10 main(map_function=map_function)
2021-01-22 16:15:46,742 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\match.py", line 33, in main
2021-01-22 16:15:46,743 loggerwriter.py write 10     
2021-01-22 16:15:46,743 loggerwriter.py write 10 hof = evo.run(cache)
2021-01-22 16:15:46,744 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\evo.py", line 123, in run
2021-01-22 16:15:46,745 loggerwriter.py write 10     
2021-01-22 16:15:46,745 loggerwriter.py write 10 return cache.search[searchMethod].run(cache, tools, creator)
2021-01-22 16:15:46,745 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\search\nsga3.py", line 36, in run
2021-01-22 16:15:46,747 loggerwriter.py write 10     
2021-01-22 16:15:46,747 loggerwriter.py write 10 return checkpoint_algorithms.eaMuPlusLambda(
2021-01-22 16:15:46,747 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\checkpoint_algorithms.py", line 124, in eaMuPlusLambda
2021-01-22 16:15:46,749 loggerwriter.py write 10     
2021-01-22 16:15:46,749 loggerwriter.py write 10 stalled, stallWarn, progressWarn = util.eval_population(
2021-01-22 16:15:46,749 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\util.py", line 926, in eval_population
2021-01-22 16:15:46,751 loggerwriter.py write 10     
2021-01-22 16:15:46,752 loggerwriter.py write 10 return eval_population_base(
2021-01-22 16:15:46,752 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\util.py", line 942, in eval_population_base
2021-01-22 16:15:46,753 loggerwriter.py write 10     
2021-01-22 16:15:46,754 loggerwriter.py write 10 return process_population(
2021-01-22 16:15:46,754 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\util.py", line 916, in process_population
2021-01-22 16:15:46,755 loggerwriter.py write 10     
2021-01-22 16:15:46,755 loggerwriter.py write 10 writeMetaFront(cache, meta_hof, path_meta_csv)
2021-01-22 16:15:46,755 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\util.py", line 968, in writeMetaFront
2021-01-22 16:15:46,757 loggerwriter.py write 10     
2021-01-22 16:15:46,757 loggerwriter.py write 10 new_data = pandas.DataFrame(new_data, columns=cache.headers)
2021-01-22 16:15:46,757 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\pandas\core\frame.py", line 474, in __init__
2021-01-22 16:15:46,759 loggerwriter.py write 10     
2021-01-22 16:15:46,760 loggerwriter.py write 10 arrays, columns = to_arrays(data, columns, dtype=dtype)
2021-01-22 16:15:46,760 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 461, in to_arrays
2021-01-22 16:15:46,761 loggerwriter.py write 10     
2021-01-22 16:15:46,762 loggerwriter.py write 10 return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
2021-01-22 16:15:46,762 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\pandas\core\internals\construction.py", line 500, in _list_to_arrays
2021-01-22 16:15:46,763 loggerwriter.py write 10     
2021-01-22 16:15:46,763 loggerwriter.py write 10 raise ValueError(e) from e
2021-01-22 16:15:46,763 loggerwriter.py write 10 ValueError
2021-01-22 16:15:46,764 loggerwriter.py write 10 : 
2021-01-22 16:15:46,764 loggerwriter.py write 10 14 columns passed, passed data had 13 columns
2021-01-22 16:15:46,765 util.py info 54 process shutting down

mobile_phase_modulator.json (1.1 KB)utils.ipynb (18.3 KB) 02_CADET-Match_exercise_solution_Mobile_phase_modulator.ipynb (2.8 MB)

The most likely reason for this is a match was started and then the parameters or experiments where changed. The system tries to resume from the last run but if the settings change this is not possible.

Try deleting the results directory and rerunning the match or change the results directory name to a new directory.

If you add the CSV file to match against, I can try and test it if this does not work.

Hi William,
Thank you for quick response! I deleted old result directory and also changed result directory name and this time it ran for 11 min and 14 sec but after that got the same error of process shutting down.

02-09 14:55:27,834 match.py print_version 113 CADETMatch starting up version: 0.6.17
2021-02-09 14:55:27,868 match.py print_version 138 joblib version: 0.16.0 tested with 0.15.1
2021-02-09 14:55:27,872 match.py print_version 138 addict version: 2.2.1 tested with 2.2.1
2021-02-09 14:55:27,877 match.py print_version 138 corner version: 0.0.0 tested with 2.0.1
2021-02-09 14:55:27,882 match.py print_version 138 emcee version: 0.0.0 tested with 3.0.2
2021-02-09 14:55:27,887 match.py print_version 138 SALib version: 0.0+nnone tested with 0.0+nnone
2021-02-09 14:55:27,892 match.py print_version 138 deap version: 1.3.1 tested with 1.3.1
2021-02-09 14:55:27,898 match.py print_version 138 psutil version: 5.7.0 tested with 5.7.0
2021-02-09 14:55:27,903 match.py print_version 138 numpy version: 1.18.5 tested with 1.18.5
2021-02-09 14:55:27,908 match.py print_version 138 openpyxl version: 3.0.4 tested with 3.0.3
2021-02-09 14:55:27,913 match.py print_version 138 scipy version: 1.5.0 tested with 1.5.0
2021-02-09 14:55:27,919 match.py print_version 138 matplotlib version: 3.2.2 tested with 3.2.1
2021-02-09 14:55:27,924 match.py print_version 138 pandas version: 1.0.5 tested with 1.0.5
2021-02-09 14:55:27,929 match.py print_version 138 h5py version: 2.10.0 tested with 2.10.0
2021-02-09 14:55:27,933 match.py print_version 138 cadet version: 0.6 tested with 0.6
2021-02-09 14:55:27,938 match.py print_version 138 seaborn version: 0.10.1 tested with 0.10.1
2021-02-09 14:55:27,958 match.py print_version 138 scikit-learn version: 0.23.1 tested with 0.23.1
2021-02-09 14:55:27,962 match.py print_version 138 jstyleson version: 0.0.2 tested with 0.0.2
2021-02-09 14:55:28,887 auto_keq.py getHeaders 162 parameter ['/input/model/unit_001/adsorption/MPM_KA', '/input/model/unit_001/adsorption/MPM_KD'] log ka log keq
2021-02-09 14:55:28,887 auto.py getHeaders 159 parameter /input/model/unit_001/adsorption/MPM_QMAX linear
2021-02-09 14:55:28,888 auto.py getHeaders 159 parameter /input/model/unit_001/adsorption/MPM_QMAX linear
2021-02-09 14:55:29,224 smoothing.py load_data 201 smoothing_factor main_Pulse  2.456e-04  critical frequency 9.170e-02  critical frequency der 9.170e-02 knots 93
2021-02-09 14:55:29,510 util.py setupSimulation 1166 mobile_phase_modulator_reference.h5 abstol=1e-06  reltol=1e-06
2021-02-09 14:55:31,764 match.py setupTemplates 253 simulation took 2.2531657218933105
2021-02-09 14:55:33,998 match.py setupTemplates 289 simulation final took 2.179171323776245
2021-02-09 14:55:34,041 nsga3.py generate_reference_points 127 Reference points chosen P = [7, 4, 3, 2]  with shape (995, 6)
2021-02-09 14:55:36,219 gradFD.py create_template 38 grad simulation took 2.1412689685821533
2021-02-09 14:55:36,219 gradFD.py create_template 40 grad C:/Users/kusum.solanki/CADET-Tutorial/08_CADET-Match_Introduction/results_exercise_test/misc/template_main_grad.h5 abstol=1e-06  reltol=1e-06
2021-02-09 15:00:37,571 util.py process_population 846 Generation -1 approximately 45.7 % complete with 137/300 done
2021-02-09 15:05:38,759 util.py process_population 846 Generation -1 approximately 92.7 % complete with 278/300 done
2021-02-09 15:06:22,578 loggerwriter.py write 10 Traceback (most recent call last):
2021-02-09 15:06:22,578 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\match.py", line 322, in <module>
2021-02-09 15:06:22,583 loggerwriter.py write 10     
2021-02-09 15:06:22,583 loggerwriter.py write 10 main(map_function=map_function)
2021-02-09 15:06:22,583 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\match.py", line 33, in main
2021-02-09 15:06:22,584 loggerwriter.py write 10     
2021-02-09 15:06:22,584 loggerwriter.py write 10 hof = evo.run(cache)
2021-02-09 15:06:22,585 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\evo.py", line 123, in run
2021-02-09 15:06:22,587 loggerwriter.py write 10     
2021-02-09 15:06:22,587 loggerwriter.py write 10 return cache.search[searchMethod].run(cache, tools, creator)
2021-02-09 15:06:22,587 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\search\nsga3.py", line 36, in run
2021-02-09 15:06:22,590 loggerwriter.py write 10     
2021-02-09 15:06:22,590 loggerwriter.py write 10 return checkpoint_algorithms.eaMuPlusLambda(
2021-02-09 15:06:22,590 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\checkpoint_algorithms.py", line 89, in eaMuPlusLambda
2021-02-09 15:06:22,592 loggerwriter.py write 10     
2021-02-09 15:06:22,592 loggerwriter.py write 10 stalled, stallWarn, progressWarn = util.eval_population(
2021-02-09 15:06:22,592 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\util.py", line 926, in eval_population
2021-02-09 15:06:22,594 loggerwriter.py write 10     
2021-02-09 15:06:22,594 loggerwriter.py write 10 return eval_population_base(
2021-02-09 15:06:22,594 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\util.py", line 942, in eval_population_base
2021-02-09 15:06:22,596 loggerwriter.py write 10     
2021-02-09 15:06:22,596 loggerwriter.py write 10 return process_population(
2021-02-09 15:06:22,596 loggerwriter.py write 10   File "C:\Users\kusum.solanki\Anaconda3\lib\site-packages\CADETMatch\util.py", line 890, in process_population
2021-02-09 15:06:22,598 loggerwriter.py write 10     
2021-02-09 15:06:22,598 loggerwriter.py write 10 new_best_min = max([i.fitness.values[2] for i in meta_hof.items])
2021-02-09 15:06:22,599 loggerwriter.py write 10 ValueError
2021-02-09 15:06:22,599 loggerwriter.py write 10 : 
2021-02-09 15:06:22,599 loggerwriter.py write 10 max() arg is an empty sequence
2021-02-09 15:06:22,614 util.py info 54 process shutting down

Can you include the CSV file so I can run this also? The most likely explanation for this is that in the entire search space none of the simulations came even close to matching the data and it could not find a useful place to start the next generation.

HIC MechModel Run 4.csv (95.9 KB)

Hi William,

I have uploaded csv file for which I am trying to fit MPM model to and also uploading edited PE script . Please let me know if you need any other information. Thanks for your help again!
02_CADET-Match_exercise_solution.ipynb (1.4 MB)

Sorry, this is taking a while.

I tried to get the simulation to run and I get failures related to the number of components. When I have made adjustments for that there are no parameters CADETMatch can change that make any difference. The core problem seems to be that there are two components needed (the mediator and your molecule of interest) and the fitting is only done on the mediator binding and not the ones for the molecule.

I have created a basic jupyter notebook for making a working MPM simulation. The only thing you should need to change to get it to run is the cadet_path = line in the first cell to where your cadet is installed.

After that the second cell has complete simualtion setup in it and the most likely settings you need to change are at the top of the cell. Probably the most important settings to make sure of are the concentrations of the mediator and the molecule of interest. The particle porosity is probably wrong also. A porosity of 1e-5 effectively means the molecule can’t enter the pores and there will be no binding. I took the values from the notebooks previous attached.

The third cell will run the simulation and display any errors that occur, and the fourth cell will plot the mediator and the target molecule.

Once the model is working, I think the matching part will be quite easy.

MPM.ipynb (73.5 KB)

Hi William,

Thank you for your help with the simulation files! We edited it and it now runs for a minute or so and also updated particle porosity values. Our PE script for MPM now runs for about an hour but process shuts down after generation 0 only. We are estimating parameters using three experiments data. I am attaching three simulated h5 files, PE script and three excel. Csv files. Could you please have a quick look at these files and let us know the edits, CADETMatch-IsothermChar2.ipynb (47.2 KB) HIC MechModel Run 4.csv (95.9 KB) IsothermChar55Load30salt.csv (103.0 KB) IsothermChar55Load50salt.csv (112.6 KB) MPM_55L_20C.h5 (680.4 KB) MPM_55L_30C.h5 (648.2 KB) MPM_55L_50C.h5 (706.0 KB) we should do to converge the PE script. Also, we have played with different parameters values and in each case it proceeds only up to generation 0 and then process shuts down. RMSE reached in each case was around 0.6. Is there any criterion or threshold for RMSE values in order to scripts to converge to a meaningful values?

Thank you again for helping us with the scripts!
Kusum

I have gotten the code to run and I can see what the problem is but I don’t know enough about MPM to figure out how to fix it. The problem CADETMatch is having and why it fails is that no matter what parameter it changes there is no change to the output so it can’t select the best members to work from. There are newer versions of CADET-Match available on PyPi that don’t crash on this but that does not change much.

I see one small thing that is probably an error is you probably need to change the parameters from component 0 to component 1 since that should be your protein. Even when I change this
I don’t see any change in the output.

The part I don’t know how to fix is that no matter what the values are for those parameters they don’t seem to do anything to the output. I don’t know enough about MPM though to know
what a reasonable range is or if the model is even reasonable for what is going on.

@lieres @j.schmoelder Do either of you have ideas for this?

Thank you very much Bill for this detailed analysis. I think we have reached the boundary between CADET support and a consulting case here. Kusum, please send me a private message if you want to discuss the further procedure.