Simultaneous Optimization for multiple components for protein affinity chromatography

Hey there,

I was currently trying to simulate the protein CEX chromatography elution curve, for which I reviewed the case studies provided in the documentation. I successfully simulated my desired curve and optimised parameters for any of my protein load components. However, I have three variants in my load, therefore three components, except for salt. All of the examples were related to single-component optimisations. How should I approach optimising params simultaneously for multiple components? One way I thought was adding each component as a different reference and then creating a separate optimisation problem for each of them. But that seemed inefficient. Is there a shorter way to create a combined optimisation problem for each of them?

I will share what my current optimization problem looks like for single variable:

from CADETProcess.comparison import Comparator
comparator = Comparator()

from CADETProcess.reference import ReferenceIO
reference = ReferenceIO(“Acidic”, new_time, data[“Cacidic_gauss”], component_system=ComponentSystem([“Acidic”]))
# reference1 = ReferenceIO(“Basic”, new_time, data[“Cbasic_gauss”], component_system=ComponentSystem([“Basic”]))
# reference2 = ReferenceIO(“Main”, new_time, data[“Cmain_gauss”], component_system=ComponentSystem([“Main”]))

#I see how the experimental data is added
#_ = reference.plot()

comparator.add_reference(reference)
# comparator.add_reference(reference1)
# comparator.add_reference(reference2)

comparator.add_difference_metric(‘PeakHeight’, reference, “column.outlet”, components = [“Acidic”], start = 2800)
comparator.add_difference_metric(‘PeakPosition’, reference, “column.outlet”, components = [“Acidic”], start = 2800)
comparator.add_difference_metric(‘Shape’, reference, “column.outlet”, components = [“Acidic”], start = 2800)

# comparator.add_difference_metric(‘SSE’, reference1, “column.outlet”, components = [“Basic”], start = 2620)
# comparator.add_difference_metric(‘SSE’, reference2, “column.outlet”, components = [“Main”], start = 2620)

_ = comparator.plot_comparison(simulation_results)

from CADETProcess.optimization import OptimizationProblem

optimization_problem = OptimizationProblem(“4_params”)
optimization_problem.add_evaluation_object(process)
optimization_problem.add_evaluator(process_simulator)

optimization_problem.add_objective(
comparator,
n_objectives = comparator.n_metrics,
requires = [process_simulator])

optimization_problem.add_variable(
name = “adsorption_rate”,
parameter_path = “flow_sheet.column.binding_model.adsorption_rate”,
lb = 1e7, ub = 1e11,
transform = “auto”,
indices = [1])

optimization_problem.add_variable(
name = “ion_exchange_characteristic”,
parameter_path = “flow_sheet.column.binding_model.ion_exchange_characteristic”,
lb = 1e0, ub = 1.2e1,
transform = “auto”,
indices = [1])

optimization_problem.add_variable(
name = “capacity”,
parameter_path = “flow_sheet.column.binding_model.capacity”,
lb = 1e-01, ub = 1.2e2,
transform = “auto”,
indices = [1])

optimization_problem.add_variable(
name = “pore_diffusion”,
parameter_path = “flow_sheet.column.pore_diffusion”,
lb = 1e-13, ub = 1e-9,
transform = “auto”,
indices = [1])

def callback(simulation_results, individual, evaluation_object, callbacks_dir = “./”):#don’t know what callbacks_dir is
comparator.plot_comparison(
simulation_results,
file_name=f’{callbacks_dir}/{individual.id}_{evaluation_object}_comparison.png’,
show = True
)
print(simulation_results)
print(individual.x, individual.f)
print(evaluation_object)
print(callbacks_dir)

optimization_problem.add_callback(callback, requires=[process_simulator])

from CADETProcess.optimization import U_NSGA3
optimizer = U_NSGA3()
optimizer.n_cores = 6
optimizer.progress_frequency = 2
optimizer.pop_size = 5
optimizer.n_max_gen = 50

optimization_results = optimizer.optimize(
optimization_problem,
use_checkpoint = False
)

Please guide.

Thanks.

1 Like

A good strategy I use for this is to first sequentially fit the components individually (i.e., fit A, then B, then C). The parameters from fitting A are used in the problem for B, and then both for C. Once this is done, you can run another fit for A, B, and C all at the same time - but with very narrow parameter bounds (based off the final results of the sequential component fitting). You can code this as a worklist so this whole process can be automated. If you happen to have very good starting ranges you could skip to the final step (all components). Highly recommend to use parallel processing, and a good CPU workstation if available.

1 Like

If you’re more specifically asking about how to set this up in CADET-Process, you can add multiple components to a single reference, but then I think you have to have the same difference metric for all of them. That would look like the following for a fractionation reference:


    comparator = Comparator(process.name)

    comps = process.component_system.names[1:]

    # Add reference for fractionation (both components)
    fractions = []
    for idx, vol in enumerate(volumes):
        fraction = Fraction(start=times[idx], end=times[idx+1],
                              mass=moles[idx], volume=vol
        )
        fractions.append(fraction)

    reference = FractionationReference('frac_ref', fractions,
                                       component_system=ComponentSystem(comps)
    )

    comparator.add_reference(frac_reference)
    comparator.add_difference_metric(
        'FractionationSSE', frac_reference, 'outlet.inlet',
        components=comps
    )

Or if you need to have different difference metrics for each component, then you can add multiple references to the comparator:

    comparator = Comparator(process.name)

    curve_comp = process.component_system.names[1]
    frac_comp = process.component_system.names[2]

    # Add reference for curve
    curve_reference = ReferenceIO(
        name='curve_ref', time=time_data, solution=solution_data,
        component_system=ComponentSystem([curve_comp])
    )

    comparator.add_reference(curve_reference)
    comparator.add_difference_metric(
        'NRMSE', curve_reference, 'outlet.inlet',
        components=curve_comp
    )

    # Add reference for fractionation
    fractions = []
    for idx, vol in enumerate(volumes):
        fraction = Fraction(start=times[idx], end=times[idx+1],
                              mass=moles[idx], volume=vol
        )
        fractions.append(fraction)

    reference = FractionationReference(
        'frac_ref', fractions, component_system=ComponentSystem([frac_comp])
    )

    comparator.add_reference(frac_reference)
    comparator.add_difference_metric(
        'FractionationSSE', frac_reference, 'outlet.inlet',
        components=frac_comp
    )

Then you can add the comparator as the objective like you already were.

To add different bounds for fitting parameters for each component, you change the component index that you are referring to with add_variable, but it’s still all under the same optimization problem. I usually use a loop for this but you don’t have to (and we’re starting at idx+1 in this case to skip salt).

for idx, bound in enumerate(Keq_bounds):
    optimization_problem.add_variable(
        name=f'Keq_{idx+1}',
        parameter_path='flow_sheet.column.binding_model.adsorption_rate',
        indices=idx+1,
        lb=bound[0], ub=bound[1],
        transform='auto'
    )

There may be other ways, but this is how I have been approaching the problem of multiple components requiring multiple difference metrics.

4 Likes

Thank you for this method, I was able to try another way to have them in a loop to add all three optimization problems in loop only problems I faced were creating separate variable name for each component and having dedicated process for each optimization problem for which I formatted index of loop into variable names and used case study of fitting binding model parameter.

Thank you for the suggestion, I finally modified my code to run three optimization problems 1 at a time.