Skip to Content
MaterialsMaterials Datasets

Materials Datasets

Databases and benchmark datasets for materials science ML.

Major Databases

DatasetDescriptionSizeLink
Materials ProjectDFT calculations, properties500K+ materialsmaterialsproject.org 
AFLOWCrystal structures, properties3.5M+ entriesaflowlib.org 
JARVIS-DFTDFT data with ML models40K+ materialsjarvis.nist.gov 
OQMDOpen Quantum Materials Database1M+ entriesoqmd.org 
NOMADComputational materials data19M+ calculationsnomad-lab.eu 
Crystallography Open DatabaseOpen-access crystal structures525K+crystallography.net 

Large-Scale Computational Datasets

DatasetDescriptionSizeLink
OMat24DFT for inorganic crystals (Meta)110M entrieshuggingface 
Open Catalyst 2020Surface relaxations for catalysis1.2M relaxationsopencatalystproject.org 
LeMat-BulkInorganic material structures6.7M structureshuggingface 
LeMat-TrajInorganic material trajectories113M trajectorieshuggingface 
MatPESStructures from 300K MD simulations~400K structuresmatpes.ai 
MP-ALOEr2SCAN DFT for universal MLIPs~1M calculationsfigshare 

Molecular & QM Datasets

DatasetDescriptionSizeLink
OMol25Molecular chemistry DFT (Meta)100M+ calculationshuggingface 
OMC25Molecular crystal structures27M+ structureshuggingface 
QM9Organic molecules with quantum properties134K moleculesquantum-machine.org 
ANI-1x/1ccxDFT + CCSD calculations5M + 0.5MMolSSI 
PubChemQCRRelaxation trajectories3.5M trajectorieshuggingface 
MSR-ACC/TAE25CCSD(T)/CBS atomization energies77K energieszenodo 

Specialized Datasets

DatasetFocusLink
2D Materials (C2DB)2D materials propertiesc2db.fysik.dtu.dk 
Carbon DataCarbon trajectories (22.9M atoms)github 
CoRE MOF 2024Metal-organic frameworksccdc 
Polymer GenomePolymers with propertieskhazana.gatech.edu 
Hydrogen Storage Materials DBHydrogen capacity datahymarc 
Porous Materials AI GymML for porous materialsgithub 

Experimental Structures

DatasetDescriptionSizeLink
ICSDInorganic experimental structures~290K structuresfiz-karlsruhe.de 
CSDOrganic crystal structures (Cambridge)~1.3M structuresccdc.cam.ac.uk 

Benchmark Datasets

DatasetDescriptionLink
MatBenchStandardized ML benchmarksmatbench.materialsproject.org 
MatBench-DiscoveryML-guided discovery benchmarkgithub 
BOOMOut-of-distribution molecules (10+ tasks)github 

Awesome Lists

ResourceDescription
awesome-matchem-datasets Materials & chemistry datasets (Blaiszik)
awesome-materials-informatics Materials informatics software and data

Accessing Data

Materials Project API

from mp_api.client import MPRester with MPRester("YOUR_API_KEY") as mpr: # Get structure by ID structure = mpr.get_structure_by_material_id("mp-149") # Search for materials docs = mpr.summary.search( elements=["Li", "Fe", "O"], num_elements=(3, 3) )

pymatgen + matminer

from matminer.datasets import load_dataset # Load benchmark dataset df = load_dataset("matbench_expt_gap")

JARVIS

from jarvis.db.figshare import data # Get DFT dataset dft_3d = data("dft_3d")