Callbacks while having dependent variables

ortler · January 6, 2024, 4:22pm

Hi CADET-Devs,

I set up a multi-process, multi-species model optimization with some dependent variables. Unfortunately, I’m having an issue with the callbacks when I introduce the dependencies. With just independent variables it runs fine. Also, I used some callbacks test functions which runs fine when I play around with callbacks_dir. Could you please have a look at my MinExample based on the example process. Thank you.

I’m running on fix/evaluation_objects_in_objectives/ and added the code change of /fix/references_with_multiple_components manually

# %%
import copy
import numpy as np
import CADETProcess
CADETProcess.settings.working_directory = 'C:/data/CADET_working_directory'


# %%
# Example import
# import example process with 2 components, A and B, create 2 individual processes from it
from examples.batch_elution.process import process as process1
components = ['A', 'B']
process1.flow_sheet.column.binding_model.adsorption_rate=[1e-2, 2e-2]
process1.flow_sheet.column.binding_model.desorption_rate=[1, 1]
process1.flow_sheet.column.binding_model.is_kinetic = True
process2 = copy.deepcopy(process1)
process1.name = 'process1'
process2.name = 'process2'
processes= [process1, process2]

# simulate processes
from CADETProcess.simulator import Cadet
simulator = Cadet(install_path = "C:\\ProgramData\\Anaconda3\\envs\\cadet-process_0.8_ObjFix\\")

simulation_results = [[] for _ in range(len(processes))]
for ip, proc in enumerate(processes):
    simulation_results[ip] = simulator.simulate(processes[ip])
    # _ = simulation_results[ip].solution.column.outlet.plot()

# %%
# Reference data
# create and store reference data for both processes, two peaks each

def gaussian(x, mu, sigma, amplitude):
    return amplitude * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))

timepoints = np.arange(0, 601)
y_values_1A = gaussian(timepoints, 5 * 60, 60, 12.5)
y_values_1B = gaussian(timepoints, 7 * 60, 60, 5)
y_values_2A = gaussian(timepoints, 6 * 60, 60, 14.5)
y_values_2B = gaussian(timepoints, 7.5 * 60, 60, 4)

ref_data1 =np.column_stack((y_values_1A, y_values_1B)) # ref data for process 1
ref_data2 =np.column_stack((y_values_2A, y_values_2B)) # ref data for process 2
ref_data = [ref_data1, ref_data2]

# %%
# Comparator

from CADETProcess.comparison import Comparator
from CADETProcess.reference import ReferenceIO

# create and store ReferenceIO instances, per process
references = [[] for _ in range(len(processes))]
for ir, proc in enumerate(processes):
    references[ir] = ReferenceIO('ref_' + str(proc), timepoints, ref_data[ir], component_system= proc.component_system)

# comparator per processes
comparators = [[] for _ in range(len(processes))]
comp_type =  'SSE'

# add references to comparator and define difference_metric
for ic, proc in enumerate(processes):
    comp_temp = Comparator()
    comp_temp.name =  'comparator_' + str(proc.name)
    comp_temp.add_reference(references[ic])
    comp_temp.add_difference_metric(
                                    comp_type, references[ic], 'column.outlet',
                                    components = components
                                    )
    comparators[ic] = copy.deepcopy(comp_temp)

# %%
# Setup optimization problem

from CADETProcess.optimization import OptimizationProblem
optimization_problem = OptimizationProblem('Min_Example')

for ip, proc in enumerate(processes):
    optimization_problem.add_evaluation_object(proc, name = str(proc.name))
optimization_problem.add_evaluator(simulator)
    
for i, proc in enumerate(processes):
    optimization_problem.add_objective(
        comparators[i],
        evaluation_objects = processes[i],
        name = 'obj_' + proc.name,
        n_objectives = comparators[i].n_metrics,
        requires = [simulator]
    )

# %%
# Adding variables

components_to_be_evaluated = ['A', 'B']
k_eq = [0.02, 0.03] # k_eq for all species

for i, comp in enumerate(components_to_be_evaluated):
    k_des_lb = 0.1
    k_des_ub = 100
    
    #adsorption rate
    optimization_problem.add_variable( 
        parameter_path = 'flow_sheet.column.binding_model.adsorption_rate',
        name = ('adsorption_rate_' + comp),
        lb = k_des_lb*k_eq[i], ub=k_des_ub*k_eq[i],
        transform = 'log',
        indices = i,
    )
    # desorption rate
    optimization_problem.add_variable(
        parameter_path = 'flow_sheet.column.binding_model.desorption_rate',
        name = ('desorption_rate_' + comp),
        lb = k_des_lb, ub=k_des_ub,
        transform = 'log',
        indices = i,
    )

# %%
# Dependencies
make_dependencies =  1 # callbacks while optimization works only if this is 0

if make_dependencies == True:
    
    def k_ads_A(k_des):
        return k_des*k_eq[0]
    def k_ads_B(k_des):
        return k_des*k_eq[1]    
    
    optimization_problem.add_variable_dependency('adsorption_rate_A', 'desorption_rate_A', 
                                                    transform = k_ads_A)
    optimization_problem.add_variable_dependency('adsorption_rate_B', 'desorption_rate_B', 
                                                transform = k_ads_B)

# %%
# Callbacks
# doesn't work for dependent variables, unclear why

def callback(results, individual, evaluation_object, callbacks_dir = './'):
    ls_proc = []
    for i, proc in enumerate(processes):
        ls_proc.append(proc.name)
    proc_no = ls_proc.index(str(evaluation_object))

    comp = comparators[proc_no]
    comp.plot_comparison(
                        results,
                        show=0, 
                        file_name=f'{callbacks_dir}/{comp.name}_{individual.x}_{individual.id}.png',
                        )

optimization_problem.add_callback(
                            callback,
                            requires=[simulator], 
                            keep_progress = True
                            )

# %%
# Test callback function

should_test_callback = 0

if should_test_callback == True:
    from CADETProcess.optimization import Individual
    if optimization_problem.n_dependent_variables != 0:
        ind = Individual([1,1])
    else:
        ind = Individual([1,1,1,1])

    if optimization_problem.callbacks[0].callbacks_dir == None:
        optimization_problem.callbacks[0].callbacks_dir = 'C:\data\CADET_working_directory\callbacks' # must be set for testing callback, otherwise error appears
    
    optimization_problem.delete_cache() # requiered for re-evaluation of same instance of Individual
    optimization_problem.setup_cache()
    optimization_problem.evaluate_callbacks(ind)

# %%
# Optimizer
should_optimize = 1

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()
optimizer.n_cores = 8
optimizer.n_max_gen = 4
optimizer.pop_size = 16
optimizer.progress_frequency = 1

if should_optimize == True:
    
    if optimization_problem.callbacks[0].callbacks_dir != None:
        optimization_problem.callbacks[0].callbacks_dir = None # must be None for callback in optimization, otherwise error appears
    
    optimization_problem.delete_cache()
    optimization_problem.setup_cache()
    
    optimization_results = optimizer.optimize(
        optimization_problem,
        save_results = True,
        )

ortler · January 7, 2024, 11:17am

It came to my mind that the multi-process setup is not required to reprodue the issue. So I stripped the MinExample to just one process. I left the code in the first post as an example for a mult-process setup as I was missing this in the Cadet-Process documentary while coding.

# %%
import numpy as np
import CADETProcess
CADETProcess.settings.working_directory = 'C:/data/CADET_working_directory'

# %%
# Example import
from examples.batch_elution.process import process as process1
components = ['A', 'B']
process1.flow_sheet.column.binding_model.adsorption_rate=[1e-2, 2e-2]
process1.flow_sheet.column.binding_model.desorption_rate=[1, 1]
process1.flow_sheet.column.binding_model.is_kinetic = True

# simulate processes
from CADETProcess.simulator import Cadet
simulator = Cadet(install_path = "C:\\ProgramData\\Anaconda3\\envs\\cadet-process_0.8_ObjFix\\")
simulation_result = simulator.simulate(process1)

# %%
# Reference data
def gaussian(x, mu, sigma, amplitude):
    return amplitude * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))

timepoints = np.arange(0, 601)
y_values_1A = gaussian(timepoints, 5 * 60, 60, 12.5)
y_values_1B = gaussian(timepoints, 7 * 60, 60, 5)
ref_data1 =np.column_stack((y_values_1A, y_values_1B))

# %%
# Comparator
from CADETProcess.comparison import Comparator
from CADETProcess.reference import ReferenceIO
reference = ReferenceIO('ref_' + process1.name, timepoints, ref_data1, component_system= process1.component_system)

# add references to comparator and define difference_metric
comparator = Comparator()
comparator.name =  'comparator'
comparator.add_reference(reference)
comparator.add_difference_metric(
                                'SSE', reference, 'column.outlet',
                                components = components
                                )

# %%
# Setup optimization problem
from CADETProcess.optimization import OptimizationProblem
optimization_problem = OptimizationProblem('Min_Example')
optimization_problem.add_evaluation_object(process1, name = process1.name)
optimization_problem.add_evaluator(simulator)
    
optimization_problem.add_objective(
    comparator,
    evaluation_objects = process1,
    name = 'obj_' + process1.name,
    n_objectives = comparator.n_metrics,
    requires = [simulator]
    )

# %%
# Adding variables
components_to_be_evaluated = ['A', 'B']
k_eq = [0.02, 0.03] # k_eq for all species

for i, comp in enumerate(components_to_be_evaluated):
    k_des_lb = 0.1
    k_des_ub = 100
    
    #adsorption rate
    optimization_problem.add_variable( 
        parameter_path = 'flow_sheet.column.binding_model.adsorption_rate',
        name = ('adsorption_rate_' + comp),
        lb = k_des_lb*k_eq[i], ub=k_des_ub*k_eq[i],
        transform = 'log',
        indices = i,
    )
    # desorption rate
    optimization_problem.add_variable(
        parameter_path = 'flow_sheet.column.binding_model.desorption_rate',
        name = ('desorption_rate_' + comp),
        lb = k_des_lb, ub=k_des_ub,
        transform = 'log',
        indices = i,
    )

# %%
# Dependencies
make_dependencies = 0

if make_dependencies == True:
    
    def k_ads_A(k_des):
        return k_des*k_eq[0]
    def k_ads_B(k_des):
        return k_des*k_eq[1]    
    
    optimization_problem.add_variable_dependency('adsorption_rate_A', 'desorption_rate_A', 
                                                    transform = k_ads_A)
    optimization_problem.add_variable_dependency('adsorption_rate_B', 'desorption_rate_B', 
                                                transform = k_ads_B)

# %%
# Callbacks
# doesn't work for dependent variables, unclear why
def callback(results, individual, evaluation_object, callbacks_dir = './'):
    comparator.plot_comparison(
                        results,
                        show=0, 
                        file_name=f'''{callbacks_dir}/{comparator.name}_x=[{', '.join(f"{num:.3g}" for num in individual.x)}]_{individual.id}.png''',
                        )

optimization_problem.add_callback(
                            callback,
                            requires=[simulator], 
                            keep_progress = True
                            )

# %%
# Test callback function
should_test_callback = 1

if should_test_callback == True:
    from CADETProcess.optimization import Individual
    if optimization_problem.n_dependent_variables != 0:
        ind = Individual([1,1])
    else:
        ind = Individual([1,1,1,1])

    if optimization_problem.callbacks[0].callbacks_dir == None:
        optimization_problem.callbacks[0].callbacks_dir = 'C:\data\CADET_working_directory\callbacks' # must be set for testing callback, otherwise error appears
    
    optimization_problem.delete_cache() # requiered for re-evaluation of same instance of Individual
    optimization_problem.setup_cache()
    optimization_problem.evaluate_callbacks(ind)

# %%
# Optimizer
should_optimize = 1

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()
optimizer.n_cores = 8
optimizer.n_max_gen = 4
optimizer.pop_size = 2
optimizer.progress_frequency = 1

if should_optimize == True:
    
    if optimization_problem.callbacks[0].callbacks_dir != None:
        optimization_problem.callbacks[0].callbacks_dir = None # must be None for callback in optimization, otherwise error appears
    
    optimization_problem.delete_cache()
    optimization_problem.setup_cache()
    
    optimization_results = optimizer.optimize(
        optimization_problem,
        save_results = True,
        )

ronald.jaepel · January 8, 2024, 3:34pm

Mini-update:
I can reproduce the error, but for some reason my debugger dies without trowing an error during the optimization run before getting to the callbacks, so I can’t find the problem with the callbacks. I’ll report back when I got more.

ronald.jaepel · January 8, 2024, 3:55pm

Reinstalling the python env fixed the debugger -.-
Anyways, found the issue. Will discuss fixes with Jo tomorrow.

ronald.jaepel · January 9, 2024, 12:14pm

We have a fix here. You can install this branch with

pip install git+https://github.com/fau-advanced-separations/CADET-Process.git@fix/callbacks_with_dependencies

This branch also includes the previous fix for comparators, so you don’t have to maintain two different installations.

Thanks for your input. Hope this solves the problem.

ortler · January 9, 2024, 1:00pm

Hey Ronald and Johannes,

thank you for the fix, it tremendously helps to see what is happening during the optimization!

I just noticed another issue related to this. When I set up dependencies and set initial values for the optimization (x0=[...]) I get an error. When I set all values, regardless if dependent or independent variables, this doesn’t work and I see the point in that. But also when I only set the independent variables, the boundary check fails because it expects all variables being set.

For the example above:

    optimization_results = optimizer.optimize(
        optimization_problem,
        x0= [1,1],
        save_results = True,
        )

ronald.jaepel · January 10, 2024, 10:30am

We have a fix here. You can install this branch with

pip install git+https://github.com/fau-advanced-separations/CADET-Process.git@fix/initial_values_with_dependencies

This branch also includes the previous fixes.

Thanks for your input. Hope this solves the problem.