Research

I work on the science of deep learning. Most recently I have been thinking about synthetic data.

I flit between two largely disjoint communities who care about neural networks.

The first sees foundation models as oracles to automate the enterprise; this community thinks in terms of KV caches and fast CUDA kernels, in terms of prefill and decoding and ensuring low latency user experiences for billions of users around the world being exposed to the magic of next-token prediction for the first time. This one might say a neural network is roughly a composed set of matrix multiplications done on tensor cores. This community speaks to me because of the year I spent in the Bay Area working on software before college.

The second is comprised mostly of physicists and mathematicians who care about neural and cognitive systems. This community sees neural networks as new Platonic objects. This community thinks in terms of mean-field approximations and kernel methods, considering neural networks as coupled units of computation, and high-dimensional loss (energy) landscapes are the core object of inquiry. This one would say neural networks are a type of parametric function class that is unexpectedly expressive, and speaks to me because the phenomenology of using simple mathematical models to make non-obvious yet true predictions about systems at scale is tantalizing.

I feel immensely lucky to have been mentored by people in both.