Sjoerd van Steenkiste

Sjoerd van Steenkiste
sjoerdvansteenkiste at gmail dot com

I am a Senior Research Scientist at Google Research and currently based in Mountain View. My research focuses on:

Methods for interacting with 4D visual scenes: learning representations that capture meaningful structure (objects, geometry, etc.) [cf. 1,2]; Controllable generation for scene editing [cf. 3,4].
Foundations of LLMs: How data mixtures and architecture affect (pre-)training [cf. 5,6]; Understanding and improving (probabilistic) reasoning [cf. 7,8]. I also recently became involved in improving thinking in Gemini.

More broadly, I am interested in vison-language models, compositionality, learning 'symbol-like' representations with NNs, and the binding problem.

Previously, I completed by Ph.D. in Computer Science at IDSIA and briefly worked as a Postdoctoral Researcher. I received my M.Sc. (2x) and B.Sc. from Maastricht University. I have also spent time at Google Brain, NNAISENSE, and AtonRâ.

CV / Google Scholar / Twitter / Thesis

What's new?

In our new paper we show how Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models.
New work on evaluating world models in LMs: Can Language Models Perform Implicit Bayesian Inference Over User Preference States? To appear at the NeurIPS System 2 Reasoning At Scale Workshop.
New work investigating How Does Code Pretraining Affect Language Model Task Performance? To appear at the BlackboxNLP Workshop co-located with EMNLP.
Two accepted papers at NeurIPS 2024 proposing new methods for 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models (spotlight) and for learning Scene-Grounded Video Representations by "moving off-the-grid" (spotlight).
One EMNLP 2024 paper where we Benchmark Vision Language Models for Cultural Understanding.
I will serve as a Senior Area Chair for ICLR 2025.
Two accepted papers at NAACL 2024 comparing Syllogistic Reasoning in Humans and Language Models and evaluating The Impact of Depth on Compositional Generalization in Transformer Language Models.
I will serve as an Area Chair for ICML 2024
Two accepted papers at ICLR 2024 on Diffusion for Object-centric Representations of Scenes and on learning Dynamic Neural Scene Representations on Real-World Videos (spotlight).

Research

For an up to date list of publications, please see my Google Scholar page.