Skip to Content
BiologyBiology Datasets

Biology Datasets

Benchmark datasets for drug discovery, protein science, and computational biology.

Drug Discovery

DatasetTaskSizeLink
TDCDrug discovery benchmarksVarioustdcommons.ai 
MoleculeNetADMET, toxicity700K+ moleculesPaper 
ChEMBLBioactivity data2M+ compoundsebi.ac.uk/chembl 

Proteins

DatasetTaskSizeLink
AlphaFold DBProtein structures200M+ structuresalphafold.ebi.ac.uk 
PDBExperimental structures200K+rcsb.org 
UniProtProtein sequences250M+uniprot.org 

Dataset Collections

ResourceDescription
TDC Therapeutics Data Commons — comprehensive drug discovery benchmarks
awesome-small-molecule-ml Curated datasets for drug discovery