Rahul Chand

Papers I am reading/read

Generic

Original MoE: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
How to train MoEs (vanilla): Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity

Efficient LLM

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Alignment/Interpretability

Best Neurips 2023 Paper (DPO): Direct Preference Optimization: Your Language Model is Secretly a Reward Model
RLHF: Training language models to follow instructions with human feedback
Self-Instruct: Aligning Language Models with Self-Generated Instructions *

* = To Be Read