Lessons from the trenches

I've been working on an open source megarepo that reimplements a bunch of modern deep learning fundamentals. It's involved writing over 20k lines of PyTorch by hand, often relying on nothing more than nn.Linear to implement things from diffusion transformers to multi-latent attention and deep RL techniques like PPO and SAC. As someone who started with an LLM-only background in deep learning, and some theoretical training, here are some of the things I've learned, or wish I knew before I started.