Loading…
Toward Predictive Chemical Deformulation Enabled by Deep Generative Neural Networks
The design of chemical formulations is a challenging, high-dimensional problem. In typical formulations, tens of thousands of ingredients are available for use, yet only a tiny fraction end up in a given formulation. Deformulation, the problem of reverse engineering the precise amounts of each ingre...
Saved in:
Published in: | Industrial & engineering chemistry research 2021-10, Vol.60 (39), p.14176-14184 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The design of chemical formulations is a challenging, high-dimensional problem. In typical formulations, tens of thousands of ingredients are available for use, yet only a tiny fraction end up in a given formulation. Deformulation, the problem of reverse engineering the precise amounts of each ingredient starting from just a list of ingredients, is similarly challenging but is a key capability for staying up-to-date with industry competitors. Here, we take advantage of a large, curated formulations dataset from CAS, a division of the American Chemical Society, which offers a consistent and highly structured representation of the formulations and the chemical identities of their components to show that a variational autoencoder neural network learns meaningful representations of formulations in various product classes such as antiperspirants and oral care. Furthermore, it can be used in conjunction with a two-step sampling algorithm to generate accurate ingredient amount suggestions for deformulation. Deformulation using a variational autoencoder produces estimates that are significantly more accurate than nearest neighbor methods, extrapolates better to formulations that are significantly different than previously seen formulations, and provides a way to leverage large datasets for industrially relevant capabilities. |
---|---|
ISSN: | 0888-5885 1520-5045 |
DOI: | 10.1021/acs.iecr.1c00634 |