Biology Datasets
Benchmark datasets for drug discovery, protein science, and computational biology.
Drug Discovery
| Dataset | Task | Size | Link |
|---|---|---|---|
| TDC | Drug discovery benchmarks | Various | tdcommons.ai |
| MoleculeNet | ADMET, toxicity | 700K+ molecules | Paper |
| ChEMBL | Bioactivity data | 2M+ compounds | ebi.ac.uk/chembl |
Proteins
| Dataset | Task | Size | Link |
|---|---|---|---|
| AlphaFold DB | Protein structures | 200M+ structures | alphafold.ebi.ac.uk |
| PDB | Experimental structures | 200K+ | rcsb.org |
| UniProt | Protein sequences | 250M+ | uniprot.org |
Dataset Collections
| Resource | Description |
|---|---|
| TDC | Therapeutics Data Commons — comprehensive drug discovery benchmarks |
| awesome-small-molecule-ml | Curated datasets for drug discovery |