ChemDataExtractor v2 automatically extracts scientific data from documents, generating databases from primary literature sources.

Give it a journal article, and it will extract physical properties of interest to you, together with chemical names, labels and roles, the property values, errors, and their units found.

ChemDataExtractor v2 is based on its predecessor, and retains all of its functionality. It uses state-of-the-art natural language processing and machine-learning algorithms.

Beyond single property extraction

Instead of extracting only one physical property at a time, ChemDataExtractor v2 employs an innovative model building concept, enabling you to extract complete nested hierarchies of many interconnected properties.

For example, you might extract properties of a compound, together with quantities defining the experimental setup used.

No more manually written rules for parsing

Through the integration with TableDataExtractor, table processing is now fully automated. This means that a vast amount of data that were locked away before is now accessible. The structured tabular format enables data-extraction without the need for any manually written parsers.

ChemDataExtractor v2 uses automatic text parsers for the first time, eliminating the need to manually write complex parsers for extraction.

Setting up a functioning data extraction system has never been quicker!

Physical Quantities and Units

ChemDataExtractor v2 understands physical quantities. It will extract values with any meaningful complex and composite units found and standardize the units output to SI.

Definitions for physical quantities are picked up on the go, eliminating the need to hard-code all the symbols manually.

High Precision

ChemDataExtraction v2 has achieved an overall precision of 92.2 %, as evaluated on a dataset composed of 26 different journals, using a system of 18 simultaneously extracted interrelated properties.

Check out our publication to get a good overview of how ChemDataExtractor v2 works, and what it can do for you.

92 %

Open Source

ChemDataExtractor v2 is available as an open source python package that you can download and use for free.

Check out the documentation for help getting started.

