Methods Resources
Learning resources for ML methods in science.
Deep Learning Fundamentals
Neural Networks: Zero to Hero
github.com/karpathy/nn-zero-to-hero
Andrej Karpathy’s tutorial series — the best starting point for understanding neural networks.
Graph Neural Networks
| Resource | Description |
|---|---|
| DeepChem Tutorials | GNNs for molecular property prediction |
| PyTorch Geometric Tutorials | General GNN tutorials |
Transformers & Language Models
| Resource | Description |
|---|---|
| Transformers for Chemistry | LLMs in chemistry and materials |
| awesome-scientific-language-models | Comprehensive list of scientific LLMs |
Scientific Language Models
General Scientific LLMs
| Model | Description | Size | Links |
|---|---|---|---|
| SciBERT | BERT model for scientific text | Base | Paper / GitHub |
| Galactica | Large Language Model for Science | 125M-120B | Paper |
| DARWIN | Domain-specific models for natural science | 7B | Paper |
| SciGLM | Scientific instruction-tuned model | 6B | Paper |
| INDUS | Efficient language models for science | 38M-125M | Paper |
Document Representation
| Model | Description | Links |
|---|---|---|
| SPECTER | Document representation using citations | Paper |
| SciNCL | Contrastive learning for scientific documents | Paper |
| SciMult | Multi-task contrastive learning (138M) | Paper |
Chemistry & Materials LLMs
| Model | Description | Size | Links |
|---|---|---|---|
| ChemBERT | Chemical reaction extraction | Base | Paper |
| MatSciBERT | Materials domain language model | Base | Paper |
| BatteryBERT | Battery database enhancement | Base | Paper |
| ChemDFM | Chemistry dialogue foundation model | 13B | Paper |
| ChemLLM | Chemical large language model | 7B | Paper |
| LlaSMol | Chemistry instruction tuning dataset | 6.7B-7B | Paper |
| KALE-LM | Knowledge-enhanced science model | 8B | Paper |
Molecule-Language Models
| Model | Description | Size | Links |
|---|---|---|---|
| Text2Mol | Cross-modal molecule retrieval | - | Paper |
| KV-PLM | Molecule structure-text bridge | - | Paper |
| MolT5 | Molecule-language translation | 60M-770M | Paper |
| MoleculeSTM | Multi-modal structure-text model | - | Paper |
Math & Reasoning Models
| Model | Description | Size | Links |
|---|---|---|---|
| MathBERT | Pre-trained for mathematics education | Base | Paper |
| Minerva | Solving quantitative reasoning problems | - | Paper |
| WizardMath | Mathematical reasoning via reinforcement | 7B-70B | Paper |
| MAmmoTH | Math generalist through hybrid tuning | 7B-70B | Paper |
| MetaMath | Bootstrap mathematical questions | - | Paper |
| ToRA | Tool-integrated reasoning agent | 7B-70B | Paper |
| Llemma | Open math language model | 7B-34B | Paper |
| DeepSeekMath | Mathematical reasoning limits | 7B | Paper |
| InternLM-Math | Verifiable reasoning via Lean4 | 7B-20B | Paper |
Table Understanding
| Model | Description | Size | Links |
|---|---|---|---|
| TAPAS | Weakly supervised table parsing | Base | Paper |
| TaBERT | Table and natural language fusion | Base | Paper |
| TAPEX | Neural SQL executor pre-training | 140M-406M | Paper |
| OmniTab | Few-shot table QA with synthetic data | 406M | Paper |
| TableLlama | Open generalist table models | 7B | Paper |
Physics & Astronomy
| Model | Description | Size | Links |
|---|---|---|---|
| astroBERT | Language model for astronomy | Base | Paper |
| AstroLLaMA | Specialized astronomy foundation model | 7B | Paper |
| PhysBERT | Physics scientific literature embeddings | Base | Paper |
Multimodal (Vision + Language)
| Model | Description | Size | Links |
|---|---|---|---|
| G-LLaVA | Multi-modal geometry problem solving | 7B-13B | Paper |
Generative Models
| Resource | Description |
|---|---|
| AI4Chemistry Course | Includes generative models for chemistry |
Scientific Machine Learning (SciML)
Neural Differential Equations
| Paper/Method | Description | Links |
|---|---|---|
| Neural ODEs | Continuous-depth neural networks | Paper |
| Universal Differential Equations | Combining DEs with ML | Paper / Code |
| Hamiltonian NNs | Physics-preserving neural networks | Paper / Code |
| Neural CDEs | Neural controlled differential equations | Paper / Code |
Physics-Informed Neural Networks
| Paper/Method | Description | Links |
|---|---|---|
| PINNs | Physics-informed neural networks | Paper / Code |
| DeepONet | Deep operator networks | Paper / Code |
| Fourier Neural Operator | Learning in Fourier space | Paper |
| SINDy | Sparse identification of dynamical systems | Paper |
| NVIDIA PhysicsNeMo | Framework for physics-ML models | github |
| PINA | Physics-informed networks in PyTorch | github |
SciML Software
Julia
| Package | Description | Link |
|---|---|---|
| DifferentialEquations.jl | Comprehensive DE solving | docs |
| DiffEqFlux.jl | Neural DEs in Julia | docs |
| NeuralPDE.jl | Physics-informed neural networks | docs |
Python
| Package | Description | Link |
|---|---|---|
| torchdiffeq | PyTorch neural ODEs | github |
| torchdyn | PyTorch neural DEs library | github |
| diffrax | JAX-based differential equations | github |
| DeepXDE | Deep learning for scientific computing | github |
| pysindy | Python sparse identification | github |
| NeuroMANCER | Neural modules for control | github |
| SciANN | TensorFlow physics-informed NNs | github |
SciML Books & Courses
| Resource | Author | Link |
|---|---|---|
| Parallel Computing and SciML | Chris Rackauckas | book.sciml.ai |
| Data-Driven Science and Engineering | Brunton & Kutz | Cambridge |
Video Channels
| Channel | Focus |
|---|---|
| Steve Brunton | Data-driven methods, SINDy |
| Physics Informed ML | PINNs tutorials |
AI Tools for Research
Document Parsing & Processing
| Tool | Description | Link |
|---|---|---|
| MinerU | SOTA document parsing (1.2B params) | github |
| Docling | Multi-format conversion with layout reconstruction | IBM |
| Nougat | Academic document understanding | github |
| GROBID | Metadata extraction using ML | github |
| Marker | PDF to Markdown/JSON conversion | github |
| PaperQA2 | High-accuracy RAG for PDFs with citations | github |
Paper-to-Code & Reproducibility
| Tool | Description | Link |
|---|---|---|
| AutoP2C | LLM agent generating repos from papers | arxiv |
| ResearchCodeAgent | Multi-agent codification system | arxiv |
Research Agents
| Agent | Description | Link |
|---|---|---|
| The AI Scientist | Autonomous research system | arxiv |
| ChemCrow | Chemistry research agents | arxiv |
| BioDiscoveryAgent | Biological discovery automation | github |
| ToolUniverse | 600+ scientific tools (Harvard) | github |
Literature & Knowledge Management
| Tool | Description | Link |
|---|---|---|
| Semantic Scholar | AI-powered academic search | semanticscholar.org |
| OpenAlex | Open scholarly papers catalog | openalex.org |
| Research Rabbit | Literature discovery platform | researchrabbit.ai |
| Jupyter AI | JupyterLab AI extension | github |
Awesome Lists
| List | Focus |
|---|---|
| awesome-ai-for-science | AI tools for scientific research |
| awesome-learning-digital-chemistry | General learning resources |
| awesome-scientific-language-models | Scientific LLMs |
| awesome-scientific-machine-learning | SciML resources |
| awesome-pinn | Physics-informed neural networks |