Sjoerd van Steenkiste
sjoerdvansteenkiste at gmail dot com

I am a Research Scientist at Google Research interested in fundamental problems in machine learning and artificial intelligence. Previously, I was a Postdoctoral Researcher at the Dalle Molle Institute for Artificial Intelligence (IDSIA), after receiving my PhD in Informatics (Artificial Intelligence) in November 2020 under the guidance of Prof. Jürgen Schmidhuber. I received an MSc in Artificial Intelligence, an MSc in Operations Research, and a BSc in Knowledge Engineering from Maastricht University. I have also spent time at Google Brain, NNAISENSE, and AtonRâ as a research intern.

Currently, my research mainly focuses on compositional generalization in vision/language, learning structured 'symbol-like' representations with neural networks, and the binding problem. I have also worked on (meta) reinforcement learning, neuroevolution, and multiwavelets.

CV  /  Google Scholar  /  GitHub  /  Twitter  /  Thesis (slides)

What's new in 2022?

Selected papers are highlighted.

The Design of Matched Balanced Orthogonal Multiwavelets
Joël M.H. Karel, Sjoerd van Steenkiste, Ralf L.M. Peeters
Frontiers in Applied Mathematics and Statistics, 2022

In this work we present a full parameterization of the space of all orthogonal multiwavelets with two balanced vanishing moments (of orders 0 and 1), for arbitrary given multiplicity and degree of the polyphase filter. This allows one to search for matching multiwavelets for a given application, by optimizing a suitable design criterion. We present such a criterion, which is sparsity-based and useful for detection purposes, which we illustrate with an example from electrocardiographic signal analysis. We also present explicit conditions to build in a third balanced vanishing moment (of order 2), which can be used as a constraint together with the earlier parameterization. This is demonstrated by constructing a balanced orthogonal multiwavelet of multiplicity three, but this approach can easily be employed for arbitrary multiplicity.

Unsupervised Object Keypoint Learning using Local Spatial Predictability
Anand Gopalakrishnan, Sjoerd van Steenkiste, Jürgen Schmidhuber
International Conference on Learning Representations, 2021
Spotlight Presentation
pdf / code

We propose PermaKey, a novel approach to representation learning based on object keypoints. It leverages the predictability of local image regions from spatial neighborhoods to identify salient regions that correspond to object parts, which are then converted to keypoints. We demonstrate the efficacy of PermaKey on Atari where it learns keypoints corresponding to the most salient object parts and is robust to certain visual distractors. Further, on downstream RL tasks in the Atari domain we demonstrate how agents equipped with our keypoints outperform those using competing alternatives.

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks
Róbert Csordás, Sjoerd van Steenkiste, Jürgen Schmidhuber
International Conference on Learning Representations, 2021
pdf / code

In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.

Hierarchical Relational Inference
Aleksandar Stanić, Sjoerd van Steenkiste, Jürgen Schmidhuber
Proceedings of the AAAI Conference on Artificial Intelligence, 2021
pdf / code

We propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole. Unlike prior approaches, our method learns in an unsupervised fashion directly from raw visual images to discover objects, parts, and their relations. It explicitly distinguishes multiple levels of abstraction and improves over a strong baseline at modeling synthetic and real-world videos.

On the Binding Problem in Artificial Neural Networks
Klaus Greff, Sjoerd van Steenkiste, Jürgen Schmidhuber
arXiv pre-print, 2020

Contemporary neural networks still fall short of human-level generalization. In this paper, we argue that this is due to their inability to dynamically and flexibly bind information that is distributed throughout the network. This binding problem affects their capacity to acquire a compositional understanding of the world in terms of symbol-like entities (like objects), which is crucial for generalizing in predictable and systematic ways. To address this issue, we propose a unifying framework that revolves around forming meaningful entities from unstructured sensory inputs (segregation), maintaining this separation of information at a representational level (representation), and using these entities to construct new inferences, predictions, and behaviors (composition). Our analysis draws inspiration from a wealth of research in neuroscience and cognitive psychology, and surveys relevant mechanisms from the machine learning literature, to help identify a combination of inductive biases that allow symbolic information processing to emerge naturally in neural networks.

Investigating object compositionality in Generative Adversarial Networks
Sjoerd van Steenkiste, Karol Kurach, Jürgen Schmidhuber, Sylvain Gelly
Neural Networks, 2020
pdf (arxiv) / code

We present a minimal modification to the generator of a GAN to incorporate object compositionality as an inductive bias and find that it reliably learns to generate images as compositions of objects. Using this general design as a backbone, we then propose two useful extensions to incorporate dependencies among objects and background. We extensively evaluate our approach on several multi-object image datasets and highlight the merits of incorporating structure for representation learning purposes. In particular, we find that our structured GANs are better at generating multi-object images that are more faithful to the reference distribution. More so, we demonstrate how, by leveraging the structure of the learned generative process, one can 'invert' the learned generative model to perform unsupervised instance segmentation.

Improving Generalization in Meta Reinforcement Learning using Learned Objectives
Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber
International Conference on Learning Representations (ICLR), 2020
Spotlight Presentation
pdf / code

We introduce MetaGenRL, a novel meta reinforcement learning algorithm that distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that affects how future individuals will learn. Unlike recent meta-RL algorithms, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training. In some cases, it even outperforms human engineered RL algorithms. MetaGenRL uses off-policy second-order gradients during meta-training that greatly increase its sample efficiency.

Are Disentangled Representations Helpful for Abstract Visual Reasoning?
Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, Olivier Bachem
Neural Information Processing Systems (NeurIPS), 2019
pdf / code / poster

We conduct a large-scale study that investigates whether disentangled representations are more suitable for abstract reasoning tasks. Using two new tasks similar to Raven's Progressive Matrices, we evaluate the usefulness of the representations learned by 360 state-of-the-art unsupervised disentanglement models. Based on these representations, we train 3600 abstract reasoning models and observe that disentangled representations do in fact lead to better up-stream performance. In particular, they appear to enable quicker learning using fewer samples.

A Perspective on Objects and Systematic Generalization in Model-Based RL
Sjoerd van Steenkiste*, Klaus Greff*, Jürgen Schmidhuber
ICML workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI, 2019

In order to meet the diverse challenges in solving many real-world problems, an intelligent agent has to be able to dynamically construct a model of its environment. Objects facilitate the modular reuse of prior knowledge and the combinatorial construction of such models. In this work, we argue that dynamically bound features (objects) do not simply emerge in connectionist models of the world. We identify several requirements that need to be fulfilled in overcoming this limitation and highlight corresponding inductive biases.

*Both authors contributed equally

Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner*, Sjoerd van Steenkiste*, Karol Kurach, Raphaël Marinier, Marcin Michalski, Sylvain Gelly
arXiv pre-print, 2018
pdf / code / dataset / blog post

We propose Fréchet Video Distance (FVD), a new metric for generative models of video based on FID, and StarCraft 2 Videos (SCV), a collection of progressively harder datasets that challenge the capabilities of the current iteration of generative models for video. We conduct a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos, and provide initial benchmark results on SCV.

*Both authors contributed equally

A Case for Object Compositionality in Deep Generative Models of Images
Sjoerd van Steenkiste, Karol Kurach, Sylvain Gelly
NeurIPS workshop on Modeling the Physical World: Perception, Learning, and Control, 2018
NeurIPS workshop on Relational Representation Learning , 2018
pdf / code

We propose to structure the generator of a GAN to consider objects and their relations explicitly, and generate images by means of composition. On several multi-object image datasets we find that the proposed generator learns to identify and disentangle information corresponding to different objects at a representational level. A human study reveals that the resulting generative model is better at generating images that are more faithful to the reference distribution.

Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions
Sjoerd van Steenkiste, Michael Chang, Klaus Greff, Jürgen Schmidhuber
International Conference on Learning Representations (ICLR), 2018
pdf / code / poster

We present a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely unsupervised fashion. It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge.

Relational Neural Expectation Maximization
Sjoerd van Steenkiste, Michael Chang, Klaus Greff, Jürgen Schmidhuber
NIPS workshop on Cognitively Informed Artificial Intelligence, 2017
Oral Presentation, Oculus Outstanding Paper Award
pdf / code / slides

We propose a novel approach to common-sense physical reasoning that learns physical interactions between objects from raw visual images in a purely unsupervised fashion. Our method incorporates prior knowledge about the compositional nature of human perception, enabling it to discover objects, factor interactions between object-pairs to learn efficiently, and generalize to new environments without re-training.

Neural Expectation Maximization
Klaus Greff*, Sjoerd van Steenkiste*, Jürgen Schmidhuber
Neural Information Processing Systems (NIPS), 2017
NVAIL Pioneering Research Award
pdf / code / poster

In this paper, we explicitly formalize the problem of automatically discovering distributed symbol-like representations as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities.

*Both authors contributed equally

A Wavelet-based Encoding for Neuroevolution
Sjoerd van Steenkiste, Jan Koutník, Kurt Driessens, Jürgen Schmidhuber
Genetic and Evolutionary Computation Conference (GECCO), 2016
pdf / code

A new indirect scheme for encoding neural network connection weights as sets of wavelet-domain coefficients is proposed. It exploits spatial regularities in the weight-space to reduce the gene-space dimension by considering the low-frequency wavelet coefficients only. The wavelet-based encoding builds on top of a frequency-domain encoding, but unlike when using a Fourier-type transform, it offers gene locality while preserving continuity of the genotype-phenotype mapping.


Website template credits.