Loading…

In silico analysis of design of experiment methods for metabolic pathway optimization

Microbial cell factories allow the production of chemicals presenting an alternative to traditional fossil fuel-dependent production. However, finding the optimal expression of production pathway genes is crucial for the development of efficient production strains. Unlike sequential experimentation,...

Full description

Saved in:
Bibliographic Details
Published in:Computational and structural biotechnology journal 2024-12, Vol.23, p.1959-1967
Main Authors: Moreno-Paz, Sara, Schmitz, Joep, Suarez-Diez, Maria
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Microbial cell factories allow the production of chemicals presenting an alternative to traditional fossil fuel-dependent production. However, finding the optimal expression of production pathway genes is crucial for the development of efficient production strains. Unlike sequential experimentation, combinatorial optimization captures the relationships between pathway genes and production, albeit at the cost of conducting multiple experiments. Fractional factorial designs followed by linear modeling and statistical analysis reduce the experimental workload while maximizing the information gained during experimentation. Although tools to perform and analyze these designs are available, guidelines for selecting appropriate factorial designs for pathway optimization are missing. In this study, we leverage a kinetic model of a seven-genes pathway to simulate the performance of a full factorial strain library. We compare this approach to resolution V, IV, III, and Plackett Burman (PB) designs. Additionally, we evaluate the performance of these designs as training sets for a random forest algorithm aimed at identifying best-producing strains. Evaluating the robustness of these designs to noise and missing data, traits inherent to biological datasets, we find that while resolution V designs capture most information present in full factorial data, they necessitate the construction of a large number of strains. On the other hand, resolution III and PB designs fall short in identifying optimal strains and miss relevant information. Besides, given the small number of experiments required for the optimization of a pathway with seven genes, linear models outperform random forest. Consequently, we propose the use of resolution IV designs followed by linear modeling in Design-Build-Test-Learn (DBTL) cycles targeting the screening of multiple factors. These designs enable the identification of optimal strains and provide valuable guidance for subsequent optimization cycles.
ISSN:2001-0370
2001-0370
DOI:10.1016/j.csbj.2024.04.062