Loading…
ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R‑Groups into Annotated Chemical Named Entities
The number of journal articles in the scientific domain has grown to the point where it has become impossible for researchers to capitalize on all findings in their relevant discipline. Information is stored in these articles in a number of ways, including figures that describe important results. In...
Saved in:
Published in: | Journal of chemical information and modeling 2020-04, Vol.60 (4), p.2059-2072 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The number of journal articles in the scientific domain has grown to the point where it has become impossible for researchers to capitalize on all findings in their relevant discipline. Information is stored in these articles in a number of ways, including figures that describe important results. In organic chemistry, these figures often present chemical schematic diagrams that graphically define the structures of carbon-based compounds. These diagrams are intuitive for an expert to comprehend, but they are not designed for machines. This work presents ChemSchematicResolver, a software tool that can be used to identify chemical schematic diagrams within the figure of a document, resolve any R-group substituents within them, and convert the resulting diagrams to a machine-readable format in a high-throughput, autonomous fashion. The tool includes a new algorithm that is used to identify relevant diagrams and a mechanism that combines these data with contextual information from the rest of the document for the creation of highly relational databases. It includes support for a variety of general R-group structures, the first time this is available in any open-source chemical schematic diagram extraction tool. It is presented alongside a self-generated evaluation set, on which the most important assessment metric, precision, achieved 83–100% for all assessed areas. The ChemSchematicResolver tool is released under the MIT license and is available to download from www.chemschematicresolver.org. |
---|---|
ISSN: | 1549-9596 1549-960X |
DOI: | 10.1021/acs.jcim.0c00042 |