Improving Object-centric Learning with Query Optimization

October 17, 2022

Baoxiong Jia, Yu Liu, Siyuan Huang

Computer Science

Computer Vision and Pattern ...

Artificial Intelligence

Machine Learning

The ability to decompose complex natural scenes into meaningful object-centric abstractions lies at the core of human perception and reasoning. In the recent culmination of unsupervised object-centric learning, the Slot-Attention module has played an important role with its simple yet effective design and fostered many powerful variants. These methods, however, have been exceedingly difficult to train without supervision and are ambiguous in the notion of object, especially for complex natural scenes. In this paper, we propose to address these issues by investigating the potential of learnable queries as initializations for Slot-Attention learning, uniting it with efforts from existing attempts on improving Slot-Attention learning with bi-level optimization. With simple code adjustments on Slot-Attention, our model, Bi-level Optimized Query Slot Attention, achieves state-of-the-art results on 3 challenging synthetic and 7 complex real-world datasets in unsupervised image segmentation and reconstruction, outperforming previous baselines by a large margin. We provide thorough ablative studies to validate the necessity and effectiveness of our design. Additionally, our model exhibits great potential for concept binding and zero-shot learning. Our work is made publicly available at https://bo-qsa.github.io

Object-Centric Learning with Slot Attention

June 26, 2020

93% Match

Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, ... , Kipf Thomas

Machine Learning

Computer Vision and Pattern ...

Machine Learning

Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network an...

Find SimilarView on arXiv

Attention Normalization Impacts Cardinality Generalization in Slot Attention

July 4, 2024

92% Match

Markus Krimmel, Jan Achterhold, Joerg Stueckler

Computer Vision and Pattern ...

Object-centric scene decompositions are important representations for downstream tasks in fields such as computer vision and robotics. The recently proposed Slot Attention module, already leveraged by several derivative works for image segmentation and object tracking in videos, is a deep learning component which performs unsupervised object-centric scene decomposition on input images. It is based on an attention architecture, in which latent slot vectors, which hold compress...

Find SimilarView on arXiv

Learning Global Object-Centric Representations via Disentangled Slot Attention

October 24, 2024

92% Match

Tonglin Chen, Yinxuan Huang, Zhimeng Shen, Jinghao Huang, ... , Xue Xiangyang

Computer Vision and Pattern ...

Humans can discern scene-independent features of objects across various environments, allowing them to swiftly identify objects amidst changing factors such as lighting, perspective, size, and position and imagine the complete images of the same object in diverse settings. Existing object-centric learning methods only extract scene-dependent object-centric representations, lacking the ability to identify the same object across scenes as humans. Moreover, some existing methods...

Find SimilarView on arXiv

Slot-VAE: Object-Centric Scene Generation with Slot Attention

June 12, 2023

92% Match

Yanbo Wang, Letao Liu, Justin Dauwels

Computer Vision and Pattern ...

Slot attention has shown remarkable object-centric representation learning performance in computer vision tasks without requiring any supervision. Despite its object-centric binding ability brought by compositional modelling, as a deterministic module, slot attention lacks the ability to generate novel scenes. In this paper, we propose the Slot-VAE, a generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene genera...

Find SimilarView on arXiv

Bootstrapping Top-down Information for Self-modulating Slot Attention

November 4, 2024

92% Match

Dongwon Kim, Seoyeon Kim, Suha Kwak

Computer Vision and Pattern ...

Machine Learning

Object-centric learning (OCL) aims to learn representations of individual objects within visual scenes without manual supervision, facilitating efficient and effective visual reasoning. Traditional OCL methods primarily employ bottom-up approaches that aggregate homogeneous visual features to represent objects. However, in complex visual environments, these methods often fall short due to the heterogeneous nature of visual features within an object. To address this, we propos...

Find SimilarView on arXiv

Object-centric Learning with Cyclic Walks between Parts and Whole

February 16, 2023

92% Match

Ziyu Wang, Mike Zheng Shou, Mengmi Zhang

Computer Vision and Pattern ...

Learning object-centric representations from complex natural environments enables both humans and machines with reasoning abilities from low-level perceptual features. To capture compositional entities of the scene, we proposed cyclic walks between perceptual features extracted from vision transformers and object entities. First, a slot-attention module interfaces with these perceptual features and produces a finite set of slot representations. These slots can bind to any obj...

Find SimilarView on arXiv

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

June 13, 2024

92% Match

Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, ... , Zhang Zheng

Computer Vision and Pattern ...

Machine Learning

Object-centric learning (OCL) extracts the representation of objects with slots, offering an exceptional blend of flexibility and interpretability for abstracting low-level perceptual features. A widely adopted method within OCL is slot attention, which utilizes attention mechanisms to iteratively refine slot representations. However, a major drawback of most object-centric models, including slot attention, is their reliance on predefining the number of slots. This not only n...

Find SimilarView on arXiv

Self-Supervised Visual Representation Learning with Semantic Grouping

May 30, 2022

91% Match

Xin Wen, Bingchen Zhao, Anlin Zheng, ... , Qi Xiaojuan

Computer Vision and Pattern ...

Machine Learning

In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely on hand-crafted objectness priors or specialized pretext tasks to build a learning framework, which may harm generalizability. Instead, we propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint ...

Find SimilarView on arXiv

Towards Interpretable Controllability in Object-Centric Learning

October 13, 2023

91% Match

Jinwoo Kim, Janghyuk Choi, Jaehyun Kang, Changyeon Lee, ... , Kim Seon Joo

Computer Vision and Pattern ...

The binding problem in artificial neural networks is actively explored with the goal of achieving human-level recognition skills through the comprehension of the world in terms of symbol-like entities. Especially in the field of computer vision, object-centric learning (OCL) is extensively researched to better understand complex scenes by acquiring object representations or slots. While recent studies in OCL have made strides with complex images or videos, the interpretabilit...

Find SimilarView on arXiv

Simplified priors for Object-Centric Learning

October 1, 2024

91% Match

Vihang Patil, Andreas Radler, ... , Hochreiter Sepp

Computer Vision and Pattern ...

Machine Learning

Humans excel at abstracting data and constructing \emph{reusable} concepts, a capability lacking in current continual learning systems. The field of object-centric learning addresses this by developing abstract representations, or slots, from data without human supervision. Different methods have been proposed to tackle this task for images, whereas most are overly complex, non-differentiable, or poorly scalable. In this paper, we introduce a conceptually simple, fully-differ...

Find SimilarView on arXiv