Protein Design
Generative models and language models for protein sequence and structure design.
Tools
Generative Design
ProteinMPNN
Message passing for inverse folding
RFDiffusion
Diffusion model for backbone generation
ColabDesign
Notebooks for AfDesign, TrDesign, ProteinMPNN
ESM-IF1
Inverse folding with ESM
EvoDiff
Discrete diffusion for sequences
ProtGPT2
GPT-2 based sequence generation
BindCraft
Binder design
LM-Design / ByProt
Language model design
ProteinSolver
Graph neural network for constraint-based design
ECNet
Fine-tunable fitness/function prediction
Protein Language Models
ESM-2
Protein embeddings (8M–15B params)
ProtTrans
Transformer embeddings for proteins
AMPLIFY
ESM2 reimplementation with open training
PoET
Variant effect prediction and generation
EvoProtGrad
MCMC-based directed evolution
Communities
OpenBioML
Decentralized collaborative research community for open source ML and open science to accelerate biotechnology.
- Focus: Biotechnology, drug discovery, protein engineering
- Platform: Discord
Key Research Groups
- Baker Lab (UW) — Protein design
- Coley Group (MIT) — Synthesis planning, ML for chemistry
Awesome Lists
Getting Started
conda create -n protdesign python=3.10
conda activate protdesign
pip install fair-esm biopython