Research
I work on the science of deep learning. Most recently I have been thinking about synthetic data and statistical models of evaluations. I've had the great privilege of learning to do science from Sam Gershman and Aditi Raghunathan.
I got my start in deep learning only 1.5 years ago, before which I was rather more interested in neuroscience. My research interests in deep learning include
- What systems centered around foundation models will look like in 5-10 years, and how they will touch our lives in unthinkable ways
- In my opinion, understanding this requires being familiar with everything from pre and post training to inference and deployment to have a good sense of what can improve in what ways and where there are structural limitations. This is what drives me to work at all levels of the stack.
- Two very high level questions that particularly motivate the concrete research questions I study
- What will we scale after the well of inference-time compute dries up? RAG? Multi-agent systems? Data quality via synthetic data?
- I think the answer to the above question depends a lot on how foundation models are being deployed in a year from now, and how we humans want to use them.
- Will 90% of all forward passes be done inside agentic pipelines that orchestrate me getting a pizza I ordered, or in sales agents responding to my refund request on Amazon?
- Are math/software uniquely verifiable, and does this mean FMs will accelerate those two domains disproportionately?
- Similarities and differences between artificial and natural learning systems
- I have mostly worked at Marr level 2 and on neuroscience (of both brains and artificial networks), but am coming more to think Marr level 3/the cognitive level of abstraction may be the right level to understand these systems
- The questions I find interesting often center around universality
- To what extent are representations learned by brains and machine universal, and why? My work has touched on this, and the implicit bias literature in ML offers one type of answer, but for some reason I never felt satisfied with this.
- When we see our foundation models make mistakes and reason in a similar way to humans, is that a sign of something deeper at play, a trivial consequence of their training data distribution being human-generated, or merely us anthropomorphizing these systems?
I think deep learning moves fast enough that anyone with strong fundamentals in mathematics and good engineering skills can contribute to the frontier. I have been fortunate to be mentored by folks working on topics ranging from foundation model pretraining to high-dimensional probability to mouse olfaction. For exposure to this diversity of experiences and points of view I am immensely grateful.
- Scaling Laws for Precision In Submission.
- Do Mice Grok? Unveiling Hidden Progress in Sensory Cortex During Overtraining In Submission.
- Lower Data Diversity Accelerates Training: Case Studies in Synthetic Tasks In Submission.
- Asymptotic Dynamics for Delayed Feature Learning on a Toy Model HiLD at ICML 2024.
- No Free Prune: Information-Theoretic Barriers to Pruning at Initialization ICML 2024.
- Grokking as the Transition from Lazy to Rich Training Dynamics ICLR 2024.
- Human or Machine? Turing Tests for Vision and Language arXiv.
Tanishq Kumar*, Zachary Ankner*, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, Aditi Raghunathan.
Tanishq Kumar, Blake Bordelon, Cengiz Pehlevan, Venkatesh Murthy, Samuel J. Gershman.
Suhas Kotha*, Uzay Girit*, Tanishq Kumar*, Gauran Ghosal, Aditi Raghunathan.
Blake Bordelon, Tanishq Kumar, Samuel J. Gershman, and Cengiz Pehlevan.
Tanishq Kumar*, Kevin Luo*, Mark Sellke.
Tanishq Kumar, Blake Bordelon, Samuel J. Gershman*, Cengiz Pehlevan*.
Mengmi Zhang, ... Tanishq Kumar, ... Gabriel Kreiman.