Sjoerd van Steenkiste
sjoerd at idsia dot ch

I am a PhD candidate at the Swiss AI Lab IDSIA with Jürgen Schmidhuber. I received an MSc in Artificial Intelligence, an MSc in Operations Research, and a BSc in Knowledge Engineering from Maastricht University. I have also spent time at Google Brain and NNAISENSE as a research intern.

My research focuses on learning structured neural representations for visual reasoning tasks. There, I am particularly interested in learning representations that distinguish discrete "symbol-like" primitives such as objects and representations that are disentangled. I have also worked on meta reinforcement learning, neuroevolution, and multiwavelets.

CV  /  Google Scholar  /  GitHub  /  Twitter

What's new in 2020?

My current research focus is on unsupervised representation learning algorithms that learn structured world representations. Selected papers are highlighted.

Investigating object compositionality in Generative Adversarial Networks
Sjoerd van Steenkiste, Karol Kurach, Jürgen Schmidhuber, Sylvain Gelly
Neural Networks, 2020
pdf ( arxiv ) / code

We present a minimal modification to the generator of a GAN to incorporate object compositionality as an inductive bias and find that it reliably learns to generate images as compositions of objects. Using this general design as a backbone, we then propose two useful extensions to incorporate dependencies among objects and background. We extensively evaluate our approach on several multi-object image datasets and highlight the merits of incorporating structure for representation learning purposes. In particular, we find that our structured GANs are better at generating multi-object images that are more faithful to the reference distribution. More so, we demonstrate how, by leveraging the structure of the learned generative process, one can 'invert' the learned generative model to perform unsupervised instance segmentation.

Improving Generalization in Meta Reinforcement Learning using Learned Objectives
Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber
International Conference on Learning Representations (ICLR), 2020
Spotlight Presentation
pdf / code

We introduce MetaGenRL, a novel meta reinforcement learning algorithm that distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that affects how future individuals will learn. Unlike recent meta-RL algorithms, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training. In some cases, it even outperforms human engineered RL algorithms. MetaGenRL uses off-policy second-order gradients during meta-training that greatly increase its sample efficiency.

Are Disentangled Representations Helpful for Abstract Visual Reasoning?
Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, Olivier Bachem
Neural Information Processing Systems (NeurIPS), 2019
pdf / code / poster

We conduct a large-scale study that investigates whether disentangled representations are more suitable for abstract reasoning tasks. Using two new tasks similar to Raven's Progressive Matrices, we evaluate the usefulness of the representations learned by 360 state-of-the-art unsupervised disentanglement models. Based on these representations, we train 3600 abstract reasoning models and observe that disentangled representations do in fact lead to better up-stream performance. In particular, they appear to enable quicker learning using fewer samples.

A Perspective on Objects and Systematic Generalization in Model-Based RL
Sjoerd van Steenkiste*, Klaus Greff*, Jürgen Schmidhuber
ICML workshop on Generative Modeling and Model-Based Reasoning for Robotics and AI, 2019

In order to meet the diverse challenges in solving many real-world problems, an intelligent agent has to be able to dynamically construct a model of its environment. Objects facilitate the modular reuse of prior knowledge and the combinatorial construction of such models. In this work, we argue that dynamically bound features (objects) do not simply emerge in connectionist models of the world. We identify several requirements that need to be fulfilled in overcoming this limitation and highlight corresponding inductive biases.

*Both authors contributed equally

Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner*, Sjoerd van Steenkiste*, Karol Kurach, Raphaël Marinier, Marcin Michalski, Sylvain Gelly
Technical Report, 2018
pdf / code / dataset / blog post

We propose Fréchet Video Distance (FVD), a new metric for generative models of video based on FID, and StarCraft 2 Videos (SCV), a collection of progressively harder datasets that challenge the capabilities of the current iteration of generative models for video. We conduct a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos, and provide initial benchmark results on SCV.

*Both authors contributed equally

A Case for Object Compositionality in Deep Generative Models of Images
Sjoerd van Steenkiste, Karol Kurach, Sylvain Gelly
NeurIPS workshop on Modeling the Physical World: Perception, Learning, and Control, 2018
NeurIPS workshop on Relational Representation Learning , 2018
pdf / code

We propose to structure the generator of a GAN to consider objects and their relations explicitly, and generate images by means of composition. On several multi-object image datasets we find that the proposed generator learns to identify and disentangle information corresponding to different objects at a representational level. A human study reveals that the resulting generative model is better at generating images that are more faithful to the reference distribution.

Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions
Sjoerd van Steenkiste, Michael Chang, Klaus Greff, Jürgen Schmidhuber
International Conference on Learning Representations (ICLR), 2018
pdf / code / poster

We present a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely unsupervised fashion. It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge.

Relational Neural Expectation Maximization
Sjoerd van Steenkiste, Michael Chang, Klaus Greff, Jürgen Schmidhuber
NIPS workshop on Cognitively Informed Artificial Intelligence, 2017
Oral Presentation, Oculus Outstanding Paper Award
pdf / code / slides

We propose a novel approach to common-sense physical reasoning that learns physical interactions between objects from raw visual images in a purely unsupervised fashion. Our method incorporates prior knowledge about the compositional nature of human perception, enabling it to discover objects, factor interactions between object-pairs to learn efficiently, and generalize to new environments without re-training.

Neural Expectation Maximization
Klaus Greff*, Sjoerd van Steenkiste*, Jürgen Schmidhuber
Neural Information Processing Systems (NIPS), 2017
NVAIL Pioneering Research Award
pdf / code / poster

In this paper, we explicitly formalize the problem of automatically discovering distributed symbol-like representations as inference in a spatial mixture model where each component is parametrized by a neural network. Based on the Expectation Maximization framework we then derive a differentiable clustering method that simultaneously learns how to group and represent individual entities.

*Both authors contributed equally

A Wavelet-based Encoding for Neuroevolution
Sjoerd van Steenkiste, Jan Koutník, Kurt Driessens, Jürgen Schmidhuber
Genetic and Evolutionary Computation Conference (GECCO), 2016
pdf / code

A new indirect scheme for encoding neural network connection weights as sets of wavelet-domain coefficients is proposed. It exploits spatial regularities in the weight-space to reduce the gene-space dimension by considering the low-frequency wavelet coefficients only. The wavelet-based encoding builds on top of a frequency-domain encoding, but unlike when using a Fourier-type transform, it offers gene locality while preserving continuity of the genotype-phenotype mapping.


Website template credits.