Callbacks while having dependent variables

Hi CADET-Devs,

I set up a multi-process, multi-species model optimization with some dependent variables. Unfortunately, I’m having an issue with the callbacks when I introduce the dependencies. With just independent variables it runs fine. Also, I used some callbacks test functions which runs fine when I play around with callbacks_dir. Could you please have a look at my MinExample based on the example process. Thank you.

I’m running on fix/evaluation_objects_in_objectives/ and added the code change of /fix/references_with_multiple_components manually

# %%
import copy
import numpy as np
import CADETProcess
CADETProcess.settings.working_directory = 'C:/data/CADET_working_directory'


# %%
# Example import
# import example process with 2 components, A and B, create 2 individual processes from it
from examples.batch_elution.process import process as process1
components = ['A', 'B']
process1.flow_sheet.column.binding_model.adsorption_rate=[1e-2, 2e-2]
process1.flow_sheet.column.binding_model.desorption_rate=[1, 1]
process1.flow_sheet.column.binding_model.is_kinetic = True
process2 = copy.deepcopy(process1)
process1.name = 'process1'
process2.name = 'process2'
processes= [process1, process2]

# simulate processes
from CADETProcess.simulator import Cadet
simulator = Cadet(install_path = "C:\\ProgramData\\Anaconda3\\envs\\cadet-process_0.8_ObjFix\\")

simulation_results = [[] for _ in range(len(processes))]
for ip, proc in enumerate(processes):
    simulation_results[ip] = simulator.simulate(processes[ip])
    # _ = simulation_results[ip].solution.column.outlet.plot()

# %%
# Reference data
# create and store reference data for both processes, two peaks each

def gaussian(x, mu, sigma, amplitude):
    return amplitude * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))

timepoints = np.arange(0, 601)
y_values_1A = gaussian(timepoints, 5 * 60, 60, 12.5)
y_values_1B = gaussian(timepoints, 7 * 60, 60, 5)
y_values_2A = gaussian(timepoints, 6 * 60, 60, 14.5)
y_values_2B = gaussian(timepoints, 7.5 * 60, 60, 4)

ref_data1 =np.column_stack((y_values_1A, y_values_1B)) # ref data for process 1
ref_data2 =np.column_stack((y_values_2A, y_values_2B)) # ref data for process 2
ref_data = [ref_data1, ref_data2]

# %%
# Comparator

from CADETProcess.comparison import Comparator
from CADETProcess.reference import ReferenceIO

# create and store ReferenceIO instances, per process
references = [[] for _ in range(len(processes))]
for ir, proc in enumerate(processes):
    references[ir] = ReferenceIO('ref_' + str(proc), timepoints, ref_data[ir], component_system= proc.component_system)

# comparator per processes
comparators = [[] for _ in range(len(processes))]
comp_type =  'SSE'

# add references to comparator and define difference_metric
for ic, proc in enumerate(processes):
    comp_temp = Comparator()
    comp_temp.name =  'comparator_' + str(proc.name)
    comp_temp.add_reference(references[ic])
    comp_temp.add_difference_metric(
                                    comp_type, references[ic], 'column.outlet',
                                    components = components
                                    )
    comparators[ic] = copy.deepcopy(comp_temp)

# %%
# Setup optimization problem

from CADETProcess.optimization import OptimizationProblem
optimization_problem = OptimizationProblem('Min_Example')

for ip, proc in enumerate(processes):
    optimization_problem.add_evaluation_object(proc, name = str(proc.name))
optimization_problem.add_evaluator(simulator)
    
for i, proc in enumerate(processes):
    optimization_problem.add_objective(
        comparators[i],
        evaluation_objects = processes[i],
        name = 'obj_' + proc.name,
        n_objectives = comparators[i].n_metrics,
        requires = [simulator]
    )

# %%
# Adding variables

components_to_be_evaluated = ['A', 'B']
k_eq = [0.02, 0.03] # k_eq for all species

for i, comp in enumerate(components_to_be_evaluated):
    k_des_lb = 0.1
    k_des_ub = 100
    
    #adsorption rate
    optimization_problem.add_variable( 
        parameter_path = 'flow_sheet.column.binding_model.adsorption_rate',
        name = ('adsorption_rate_' + comp),
        lb = k_des_lb*k_eq[i], ub=k_des_ub*k_eq[i],
        transform = 'log',
        indices = i,
    )
    # desorption rate
    optimization_problem.add_variable(
        parameter_path = 'flow_sheet.column.binding_model.desorption_rate',
        name = ('desorption_rate_' + comp),
        lb = k_des_lb, ub=k_des_ub,
        transform = 'log',
        indices = i,
    )

# %%
# Dependencies
make_dependencies =  1 # callbacks while optimization works only if this is 0

if make_dependencies == True:
    
    def k_ads_A(k_des):
        return k_des*k_eq[0]
    def k_ads_B(k_des):
        return k_des*k_eq[1]    
    
    optimization_problem.add_variable_dependency('adsorption_rate_A', 'desorption_rate_A', 
                                                    transform = k_ads_A)
    optimization_problem.add_variable_dependency('adsorption_rate_B', 'desorption_rate_B', 
                                                transform = k_ads_B)

# %%
# Callbacks
# doesn't work for dependent variables, unclear why

def callback(results, individual, evaluation_object, callbacks_dir = './'):
    ls_proc = []
    for i, proc in enumerate(processes):
        ls_proc.append(proc.name)
    proc_no = ls_proc.index(str(evaluation_object))

    comp = comparators[proc_no]
    comp.plot_comparison(
                        results,
                        show=0, 
                        file_name=f'{callbacks_dir}/{comp.name}_{individual.x}_{individual.id}.png',
                        )

optimization_problem.add_callback(
                            callback,
                            requires=[simulator], 
                            keep_progress = True
                            )

# %%
# Test callback function

should_test_callback = 0

if should_test_callback == True:
    from CADETProcess.optimization import Individual
    if optimization_problem.n_dependent_variables != 0:
        ind = Individual([1,1])
    else:
        ind = Individual([1,1,1,1])

    if optimization_problem.callbacks[0].callbacks_dir == None:
        optimization_problem.callbacks[0].callbacks_dir = 'C:\data\CADET_working_directory\callbacks' # must be set for testing callback, otherwise error appears
    
    optimization_problem.delete_cache() # requiered for re-evaluation of same instance of Individual
    optimization_problem.setup_cache()
    optimization_problem.evaluate_callbacks(ind)

# %%
# Optimizer
should_optimize = 1

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()
optimizer.n_cores = 8
optimizer.n_max_gen = 4
optimizer.pop_size = 16
optimizer.progress_frequency = 1

if should_optimize == True:
    
    if optimization_problem.callbacks[0].callbacks_dir != None:
        optimization_problem.callbacks[0].callbacks_dir = None # must be None for callback in optimization, otherwise error appears
    
    optimization_problem.delete_cache()
    optimization_problem.setup_cache()
    
    optimization_results = optimizer.optimize(
        optimization_problem,
        save_results = True,
        )



It came to my mind that the multi-process setup is not required to reprodue the issue. So I stripped the MinExample to just one process. I left the code in the first post as an example for a mult-process setup as I was missing this in the Cadet-Process documentary while coding.

# %%
import numpy as np
import CADETProcess
CADETProcess.settings.working_directory = 'C:/data/CADET_working_directory'

# %%
# Example import
from examples.batch_elution.process import process as process1
components = ['A', 'B']
process1.flow_sheet.column.binding_model.adsorption_rate=[1e-2, 2e-2]
process1.flow_sheet.column.binding_model.desorption_rate=[1, 1]
process1.flow_sheet.column.binding_model.is_kinetic = True

# simulate processes
from CADETProcess.simulator import Cadet
simulator = Cadet(install_path = "C:\\ProgramData\\Anaconda3\\envs\\cadet-process_0.8_ObjFix\\")
simulation_result = simulator.simulate(process1)

# %%
# Reference data
def gaussian(x, mu, sigma, amplitude):
    return amplitude * np.exp(-((x - mu) ** 2) / (2 * sigma ** 2))

timepoints = np.arange(0, 601)
y_values_1A = gaussian(timepoints, 5 * 60, 60, 12.5)
y_values_1B = gaussian(timepoints, 7 * 60, 60, 5)
ref_data1 =np.column_stack((y_values_1A, y_values_1B))

# %%
# Comparator
from CADETProcess.comparison import Comparator
from CADETProcess.reference import ReferenceIO
reference = ReferenceIO('ref_' + process1.name, timepoints, ref_data1, component_system= process1.component_system)

# add references to comparator and define difference_metric
comparator = Comparator()
comparator.name =  'comparator'
comparator.add_reference(reference)
comparator.add_difference_metric(
                                'SSE', reference, 'column.outlet',
                                components = components
                                )

# %%
# Setup optimization problem
from CADETProcess.optimization import OptimizationProblem
optimization_problem = OptimizationProblem('Min_Example')
optimization_problem.add_evaluation_object(process1, name = process1.name)
optimization_problem.add_evaluator(simulator)
    
optimization_problem.add_objective(
    comparator,
    evaluation_objects = process1,
    name = 'obj_' + process1.name,
    n_objectives = comparator.n_metrics,
    requires = [simulator]
    )

# %%
# Adding variables
components_to_be_evaluated = ['A', 'B']
k_eq = [0.02, 0.03] # k_eq for all species

for i, comp in enumerate(components_to_be_evaluated):
    k_des_lb = 0.1
    k_des_ub = 100
    
    #adsorption rate
    optimization_problem.add_variable( 
        parameter_path = 'flow_sheet.column.binding_model.adsorption_rate',
        name = ('adsorption_rate_' + comp),
        lb = k_des_lb*k_eq[i], ub=k_des_ub*k_eq[i],
        transform = 'log',
        indices = i,
    )
    # desorption rate
    optimization_problem.add_variable(
        parameter_path = 'flow_sheet.column.binding_model.desorption_rate',
        name = ('desorption_rate_' + comp),
        lb = k_des_lb, ub=k_des_ub,
        transform = 'log',
        indices = i,
    )

# %%
# Dependencies
make_dependencies = 0

if make_dependencies == True:
    
    def k_ads_A(k_des):
        return k_des*k_eq[0]
    def k_ads_B(k_des):
        return k_des*k_eq[1]    
    
    optimization_problem.add_variable_dependency('adsorption_rate_A', 'desorption_rate_A', 
                                                    transform = k_ads_A)
    optimization_problem.add_variable_dependency('adsorption_rate_B', 'desorption_rate_B', 
                                                transform = k_ads_B)

# %%
# Callbacks
# doesn't work for dependent variables, unclear why
def callback(results, individual, evaluation_object, callbacks_dir = './'):
    comparator.plot_comparison(
                        results,
                        show=0, 
                        file_name=f'''{callbacks_dir}/{comparator.name}_x=[{', '.join(f"{num:.3g}" for num in individual.x)}]_{individual.id}.png''',
                        )

optimization_problem.add_callback(
                            callback,
                            requires=[simulator], 
                            keep_progress = True
                            )

# %%
# Test callback function
should_test_callback = 1

if should_test_callback == True:
    from CADETProcess.optimization import Individual
    if optimization_problem.n_dependent_variables != 0:
        ind = Individual([1,1])
    else:
        ind = Individual([1,1,1,1])

    if optimization_problem.callbacks[0].callbacks_dir == None:
        optimization_problem.callbacks[0].callbacks_dir = 'C:\data\CADET_working_directory\callbacks' # must be set for testing callback, otherwise error appears
    
    optimization_problem.delete_cache() # requiered for re-evaluation of same instance of Individual
    optimization_problem.setup_cache()
    optimization_problem.evaluate_callbacks(ind)

# %%
# Optimizer
should_optimize = 1

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()
optimizer.n_cores = 8
optimizer.n_max_gen = 4
optimizer.pop_size = 2
optimizer.progress_frequency = 1

if should_optimize == True:
    
    if optimization_problem.callbacks[0].callbacks_dir != None:
        optimization_problem.callbacks[0].callbacks_dir = None # must be None for callback in optimization, otherwise error appears
    
    optimization_problem.delete_cache()
    optimization_problem.setup_cache()
    
    optimization_results = optimizer.optimize(
        optimization_problem,
        save_results = True,
        )



Mini-update:
I can reproduce the error, but for some reason my debugger dies without trowing an error during the optimization run before getting to the callbacks, so I can’t find the problem with the callbacks. I’ll report back when I got more.

Reinstalling the python env fixed the debugger -.-
Anyways, found the issue. Will discuss fixes with Jo tomorrow.

We have a fix here. You can install this branch with

pip install git+https://github.com/fau-advanced-separations/CADET-Process.git@fix/callbacks_with_dependencies

This branch also includes the previous fix for comparators, so you don’t have to maintain two different installations.

Thanks for your input. Hope this solves the problem.

Hey Ronald and Johannes,

thank you for the fix, it tremendously helps to see what is happening during the optimization! :slight_smile:

I just noticed another issue related to this. When I set up dependencies and set initial values for the optimization (x0=[...]) I get an error. When I set all values, regardless if dependent or independent variables, this doesn’t work and I see the point in that. But also when I only set the independent variables, the boundary check fails because it expects all variables being set.

For the example above:

    optimization_results = optimizer.optimize(
        optimization_problem,
        x0= [1,1],
        save_results = True,
        )

We have a fix here. You can install this branch with

pip install git+https://github.com/fau-advanced-separations/CADET-Process.git@fix/initial_values_with_dependencies

This branch also includes the previous fixes.

Thanks for your input. Hope this solves the problem.