Comparison and Optimisation of multiple experiments with multiple simulations - Cadet Process

Stefanos · January 9, 2024, 10:03pm

Hello,

I am trying to optimise the film diffusion for three experiments. Each of the experiments was previously simulated and saved in a simulation_results list.
Now I am trying to compare experiment 1 (reference1) with simulation 1 (simulation_results[0]), experiment 2 with simulation 2 and experiment 3 with simulation 3 and optimise the film diffusion for all 3 experiments.

However, this does not work as the simulation is compared with all previously added experiments and I have no further idea how to implement this.

My current code for the comparison and optimisation is as follows:

from CADETProcess.comparison import Comparator
from CADETProcess.reference import ReferenceIO

comparator = Comparator()

comparator.add_reference(reference1) #Experiment 1
comparator.add_difference_metric('Shape', reference1,'outlet.outlet', components="Protein")
comparator.plot_comparison(simulation_results[0],  plot_individual=True)
comparator.add_reference(reference2) #Experiment 2
comparator.add_difference_metric('Shape', reference2,'outlet.outlet', components="Protein")
comparator.plot_comparison(simulation_results[1],  plot_individual=True)
comparator.add_reference(reference3) #Experiment 3
comparator.add_difference_metric('Shape', reference3,'outlet.outlet', components="Protein")
comparator.plot_comparison(simulation_results[2],  plot_individual=True)

from CADETProcess.optimization import OptimizationProblem
optimization_problem = OptimizationProblem('Film_diffusion')

optimization_problem.add_evaluation_object(process1) #Process of Simulation 1 
optimization_problem.add_evaluation_object(process2) #Process of Simulation 2
optimization_problem.add_evaluation_object(process3) #Process of Simulation 3

optimization_problem.add_variable(
    name='film_diffusion', parameter_path='flow_sheet.column.film_diffusion',
    lb=1e-10, ub=0.1,
    transform='auto'
)

optimization_problem.add_evaluator(simulator)

optimization_problem.add_objective(
    comparator,
    n_objectives=comparator.n_metrics,
    requires=[simulator]
)

def callback(simulation_results, individual, evaluation_object, callbacks_dir='./'):
    comparator.plot_comparison(
        simulation_results,
        file_name=f'{callbacks_dir}/{individual.id}_{evaluation_object}_comparison.png',
        show=False
    )

optimization_problem.add_callback(callback, requires=[simulator])

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()

if __name__ == '__main__':
    optimization_results = optimizer.optimize(
        optimization_problem,
        use_checkpoint=True
    )

I would be very grateful for any help!
Stefanos

ronald.jaepel · January 9, 2024, 10:08pm

Hey Stefanos,

you’ll need multiple comparators.

I’ve recently created an example to illustrate using multiple experiments in one optimization problem. As that hasn’t made it into the master branch yet, you’ll have to use the link above and can’t find it on the regular readthedocs website.

Let me know if you need further input.

Stefanos · January 10, 2024, 9:19am

Hello Ronald,

Thanks for the quick reply! I was able to use your example for my own code.

However, I get the following error:

ValueError                                Traceback (most recent call last)
File ~\anaconda3\envs\Test\lib\site-packages\pymoo\core\problem.py:355, in Problem._format_dict(self, out, N, return_values_of)
    354 try:
--> 355     v = v.reshape(shape[name])
    356 except Exception as e:

ValueError: cannot reshape array of size 612 into shape (34,6)

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
Cell In[13], line 7
      5 optimizer.pop_size = 50
      6 optimizer.n_max_gen = 5
----> 7 optimization_results = optimizer.optimize(
      8     optimization_problem,
      9     use_checkpoint=False
     10      )

File ~\anaconda3\envs\Test\lib\site-packages\CADETProcess\optimization\optimizer.py:260, in OptimizerBase.optimize(self, optimization_problem, x0, save_results, results_directory, use_checkpoint, overwrite_results_directory, exist_ok, log_level, reinit_cache, delete_cache, *args, **kwargs)
    256 plt.switch_backend('agg')
    258 start = time.time()
--> 260 self.run(self.optimization_problem, x0, *args, **kwargs)
    262 self.results.time_elapsed = time.time() - start
    264 if delete_cache:

File ~\anaconda3\envs\Test\lib\site-packages\CADETProcess\optimization\pymooAdapter.py:155, in PymooInterface.run(self, optimization_problem, x0)
    152 X = pop.get("X").tolist()
    154 # Evaluate objectives and report results
--> 155 algorithm.evaluator.eval(problem, pop)
    157 F = pop.get("F").tolist()
    158 if optimization_problem.n_nonlinear_constraints > 0:

File ~\anaconda3\envs\Test\lib\site-packages\pymoo\core\evaluator.py:69, in Evaluator.eval(self, problem, pop, skip_already_evaluated, evaluate_values_of, count_evals, **kwargs)
     65 # evaluate the solutions (if there are any)
     66 if len(I) > 0:
     67 
     68     # do the actual evaluation - call the sub-function to set the corresponding values to the population
---> 69     self._eval(problem, pop[I], evaluate_values_of, **kwargs)
     71 # update the function evaluation counter
     72 if count_evals:

File ~\anaconda3\envs\Test\lib\site-packages\pymoo\core\evaluator.py:90, in Evaluator._eval(self, problem, pop, evaluate_values_of, **kwargs)
     87 X = pop.get("X")
     89 # call the problem to evaluate the solutions
---> 90 out = problem.evaluate(X, return_values_of=evaluate_values_of, return_as_dictionary=True, **kwargs)
     92 # for each of the attributes set it to the problem
     93 for key, val in out.items():

File ~\anaconda3\envs\Test\lib\site-packages\pymoo\core\problem.py:257, in Problem.evaluate(self, X, return_values_of, return_as_dictionary, *args, **kwargs)
    254     only_single_value = not (isinstance(X, list) or isinstance(X, np.ndarray))
    256 # this is where the actual evaluation takes place
--> 257 _out = self.do(X, return_values_of, *args, **kwargs)
    259 out = {}
    260 for k, v in _out.items():
    261 
    262     # copy it to a numpy array (it might be one of jax at this point)

File ~\anaconda3\envs\Test\lib\site-packages\pymoo\core\problem.py:302, in Problem.do(self, X, return_values_of, *args, **kwargs)
    299     self._evaluate_vectorized(X, out, *args, **kwargs)
    301 # finally format the output dictionary
--> 302 out = self._format_dict(out, len(X), return_values_of)
    304 return out

File ~\anaconda3\envs\Test\lib\site-packages\pymoo\core\problem.py:357, in Problem._format_dict(self, out, N, return_values_of)
    355                 v = v.reshape(shape[name])
    356             except Exception as e:
--> 357                 raise Exception(
    358                     f"Problem Error: {name} can not be set, expected shape {shape[name]} but provided {v.shape}",
    359                     e)
    361         ret[name] = v
    363 # if some values that are necessary have not been set

Exception: ('Problem Error: F can not be set, expected shape (34, 6) but provided (34, 18)', ValueError('cannot reshape array of size 612 into shape (34,6)'))

I understand that a certain array size is not appropriate for him, however I don’t know where exactly the problem is.

Here is my experimental data:
3e-05.csv (291.2 KB)
6.11e-05.csv (531.6 KB)
9e-05.csv (651.4 KB)

And here is the code I am using:

from CADETProcess.processModel import StericMassAction
from CADETProcess.processModel import Inlet, GeneralRateModel, Outlet,TubularReactor, Cstr,LumpedRateModelWithPores
from CADETProcess.processModel import FlowSheet
from CADETProcess.processModel import Process
from CADETProcess.simulator import Cadet
import numpy as np

Q_ml_min = 0.5  # ml/min
Q_m3_s = Q_ml_min / (60 * 1e6)  # m3/s

V_tracer = 5e-8  # m3

starting_salt_conc = 1

component_system = ComponentSystem(['Salt', "Protein"])


binding_model = StericMassAction(component_system, name='SMA')
binding_model.is_kinetic = False
binding_model.adsorption_rate = [0, 1]
binding_model.desorption_rate = [0, (1/3.07545467548958)]
binding_model.steric_factor = [0, 1]
binding_model.capacity = 1185.185
binding_model.characteristic_charge = [0, 1.986895492003574]

# Unit Operations
inlet = Inlet(component_system, name='inlet')
inlet.flow_rate = Q_m3_s


column = LumpedRateModelWithPores(component_system, 'column')
column.binding_model = binding_model
column.length = 0.1
column.diameter = 0.0077
column.bed_porosity = 0.19  
column.particle_radius = 3.4e-5 
column.particle_porosity = 0.8333
column.axial_dispersion = 5.42572585e-08
column.film_diffusion = [5e-4, 6e-6]

column.c = [starting_salt_conc, 0]
column.cp = [starting_salt_conc, 0]
column.q = [starting_salt_conc, 0]


# Tubing1
tubing1 = TubularReactor(component_system, 'tubing1')
tubing1.length = 0.444 
tubing1.cross_section_area = 4.418e-7 
tubing1.axial_dispersion =  3.07515349e-05 


tubing2 = TubularReactor(component_system, 'tubing2')
tubing2.length = 0.444
tubing2.cross_section_area = 4.418e-7 
tubing2.axial_dispersion =  tubing1.axial_dispersion 


uv_detector = Cstr(component_system, name='uv_detector')
uv_detector.V = 2e-9
uv_detector.c = [0,0]


outlet = Outlet(component_system, name='outlet')

# Flow Sheet
flow_sheet = FlowSheet(component_system)

flow_sheet.add_unit(inlet)
flow_sheet.add_unit(column)
flow_sheet.add_unit(outlet)
flow_sheet.add_unit(tubing1)
flow_sheet.add_unit(tubing2)
flow_sheet.add_unit(uv_detector)

flow_sheet.add_connection(inlet, tubing1)
flow_sheet.add_connection(tubing1, column)
flow_sheet.add_connection(column, tubing2)
flow_sheet.add_connection(tubing2, uv_detector)
flow_sheet.add_connection(uv_detector, outlet)


def create_process(gradient_volume_length=30):
    # Process
    process = Process(flow_sheet, f'{gradient_volume_length}')

    wash_start = V_tracer / Q_m3_s
    gradient_start = 1200
    concentration_difference = np.array([1000, 0]) - np.array([0, 0])

    process.cycle_time = gradient_start + (gradient_volume_length / Q_m3_s)
    gradient_duration = process.cycle_time - gradient_start
    gradient_slope = concentration_difference / gradient_duration


    _ = process.add_event('Load', 'flow_sheet.inlet.c', [500, 1], 0)
    _ = process.add_event('Wash', 'flow_sheet.inlet.c', [starting_salt_conc, 0], wash_start)
    _ = process.add_event(
    'grad_start',
    'flow_sheet.inlet.c',
    [[starting_salt_conc, gradient_slope[0]], [0, gradient_slope[1]]],
    gradient_start
    )
    return process


from CADETProcess.simulator import Cadet

from CADETProcess.plotting import SecondaryAxis

sec = SecondaryAxis()
sec.components = ['Salt']
sec.y_label = '$c_{salt}$'

simulator = Cadet()

    
for gradient_volume in [3e-5, 6.11e-5, 9e-5]:
    process = create_process(gradient_volume_length=gradient_volume)
    simulation_results = simulator.simulate(process)
    simulation_results.solution.outlet.outlet.plot(secondary_axis=sec)


from CADETProcess.reference import ReferenceIO
def load_reference(file_name, component_index=1):
    data = np.loadtxt(file_name, delimiter=',')

    time_experiment = data[:, 0]*60
    c_experiment = data[:, component_index]

    reference = ReferenceIO(
        f'Peak {file_name}',
        time_experiment, c_experiment,
        component_system=ComponentSystem(["Protein"])
    )
    return reference

from CADETProcess.comparison import Comparator
def create_comparator(reference):
    comparator = Comparator(name=f"Comp {reference.name}")
    comparator.add_reference(reference)
    #comparator.add_difference_metric('Shape', reference,'outlet.outlet', components="Protein")
    comparator.add_difference_metric(
        'PeakPosition', reference,
        'outlet.outlet', components=["Protein"]
    )
    comparator.add_difference_metric(
        'PeakHeight', reference,
        'outlet.outlet', components=["Protein"]
    )
    return comparator

from CADETProcess.optimization import OptimizationProblem
optimization_problem = OptimizationProblem('Film_diffusion')
optimization_problem.add_evaluator(simulator)
comparators = dict()

for gradient_volume in [3e-5,6.11e-5, 9e-5]:
        process = create_process(gradient_volume)

        reference = load_reference(f"{gradient_volume}.csv")

        optimization_problem.add_evaluation_object(process)
        comparator = create_comparator(reference)

        comparators[str(gradient_volume)] = comparator

        optimization_problem.add_objective(
            comparator,
            name=f"Objective {comparator.name}",
            evaluation_objects=[process],  # limit this comparator to be applied to only this one process
            n_objectives=comparator.n_metrics,
            requires=[simulator]
        )
    
optimization_problem.add_variable(
name='film_diffusion', parameter_path='flow_sheet.column.film_diffusion',
lb=1e-10, ub=0.1,
transform='auto',indices=[1])



def callback(simulation_results, individual, evaluation_object, callbacks_dir='./'):
        comparator = comparators[evaluation_object.name]
        comparator.plot_comparison(
            simulation_results,
            file_name=f'{callbacks_dir}/{individual.id}_{evaluation_object}_comparison.png',
            show=False
        )

optimization_problem.add_callback(callback, requires=[simulator])

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()

if __name__ == '__main__':
    from CADETProcess.optimization import U_NSGA3
    optimizer = U_NSGA3()
    optimizer.n_cores = 4
    optimizer.pop_size = 50
    optimizer.n_max_gen = 5
    optimization_results = optimizer.optimize(
        optimization_problem,
        use_checkpoint=False
         )

Stefanos · January 12, 2024, 10:24am

Your example doesn’t work for me either. I forgot to mention that.
I tried it both in the environment with the command you sent me here and via the release version.

ronald.jaepel · January 12, 2024, 4:56pm

It’s mysterious. I can reproduce the error on Linux with CADET-Process v0.8 but it works on Windows and Linux with the current Master branch. ¯\_(ツ)_/¯ I’m out of ideas what could be causing it on your end.

Can you join us at the next CADET Office Hours to discuss this further?

Stefanos · January 12, 2024, 6:03pm

Hello Ronald,

thanks for the reply!
Unfortunately I get the same error with the master branch.
It’s the following git command, isn’t it?

pip install git+https://github.com/fau-advanced-separations/CADET-Process.git@master

I create an environment in Anaconda Navigator (Python 3.10.13). Then I activate it and install Cadet with the standard command and then the master branch of Cadet Process with the above command. Then I open the file in this environment.
At least that worked for the problem with optimising the film diffusion.

Yes I could participate in the next Office hours. but as the project I’m working is ending next week, my model should be ready by mid of next week, if possible

ronald.jaepel · January 15, 2024, 11:44am

Hey Stefanos,

I can reproduce the error on master and it works on dev.

After isolating the issue a bit more I noticed, that pip doesn’t reinstall if you tell it to install from a different git branch. That might be why sometimes the behavior didn’t change even though different branches were installed. So, you can try to run, in your current environment, whichever that may be:

pip uninstall cadet-process
pip install git+https://github.com/fau-advanced-separations/CADET-Process.git@dev

and that should fix the optimization bug.

Stefanos · January 15, 2024, 12:10pm

Hey Ronald,

now it works perfectly! Many thanks!

Will_m32 · January 31, 2024, 2:24pm

Hi there I had a question along a similar line. If I have 4 sets of experimental data for a single simulation (in this case 4 repeats of a dextran pulse),does the following below code make sense whereby I add a reference to my comparator for each tracer experiment (tracer_peak_1 to 4)?

Also if a situation arises where there are potentially 3 sets of optimal solution as some runs are minimised with 1 set of parameters and others with slightly different parameters, how does the output decide on the ‘optimal’ solution ?

Thanks


from CADETProcess.comparison import Comparator

comparator = Comparator()
comparator.add_reference(tracer_peak_1)
comparator.add_reference(tracer_peak_2)
comparator.add_reference(tracer_peak_3)
comparator.add_reference(tracer_peak_4)

comparator.add_difference_metric(
    'Shape', tracer_peak_1, 'outlet.outlet',)
comparator.add_difference_metric(
    'Shape', tracer_peak_2, 'outlet.outlet',)
comparator.add_difference_metric(
    'Shape', tracer_peak_3, 'outlet.outlet',)
comparator.add_difference_metric(
    'Shape', tracer_peak_4, 'outlet.outlet',)



if __name__ == '__main__':
    comparator.plot_comparison(simulation_results)
    
  
from CADETProcess.optimization import OptimizationProblem
optimization_problem = OptimizationProblem('porosity_Dax')

optimization_problem.add_evaluation_object(process)


optimization_problem.add_variable(
    name='column_axial_dispersion', parameter_path='flow_sheet.column.axial_dispersion',
    lb=0, ub=1e-3,
    transform='auto'
)

optimization_problem.add_variable(
    name='column_total_porosity', parameter_path='flow_sheet.column.total_porosity',
    lb= 0, ub= 1
    transform='auto'
)  

optimization_problem.add_evaluator(simulator)

optimization_problem.add_objective(
    comparator,
    n_objectives=comparator.n_metrics,
    requires=[simulator]
)

def callback(simulation_results, individual, evaluation_object, callbacks_dir='./'):
    comparator.plot_comparison(
        simulation_results,
        file_name=f'{callbacks_dir}/{individual.id}_{evaluation_object}_comparison.png',
        show=False
    )
optimization_problem.add_callback(callback, requires=[simulator])

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()
optimizer.n_max_gen = 50
if __name__ == '__main__':
    optimization_results = optimizer.optimize(optimization_problem,use_checkpoint=False)
    optimizer.results.plot_objectives()
    optimizer.results.plot_figures

ronald.jaepel · February 1, 2024, 11:33am

Hi Will,

that works. I’d personally use one line of code to add the references to stick to DRY principles.

comparator = Comparator()

for tracer_peak in tracers:
    comparator.add_reference(tracer_peak)
    comparator.add_difference_metric(
        'Shape', tracer_peak, 'outlet.outlet',
    )

optimization_problem.add_objective(
    comparator,
    n_objectives=comparator.n_metrics,
    requires=[simulator]
)

Also note that the references need unique names. But yes, that works.

Multi-objectives

Regarding:

Also if a situation arises where there are potentially 3 sets of optimal solution as some runs are minimised with 1 set of parameters and others with slightly different parameters, how does the output decide on the ‘optimal’ solution ?

In multi-objective optimization tasks you’ll get all individuals on the Pareto front from the optimization alorithm. Which of those you choose as your optimum is a complicated issue that depends on your specific problem. You could use the average over all objectives, or the geometric mean, or a desirability function. For your case, where multiple technical replicates are used together, I think one can argue for the average, but that’s just my opinion.

j.schmoelder · February 1, 2024, 1:18pm

Note that CADET-Process also supports meta scores for multi-criteria decision-making.