Lessons from the trenches

I've written over ten thousand lines of PyTorch by hand, often relying on nothing more than nn.Linear to implement things from diffusion transformers to multi-latent attention and deep RL techniques like PPO and SAC. As someone who started with an LLM-only background in deep learning, and some theoretical training, here are some of the things I've learned, or wish I knew before I started.