Learning Robust Representations for Visual Reinforcement Learning via Task-Relevant Mask Sampling

Vedant Dave, Ozan Özdenizci, Elmar Rückert

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Humans excel at isolating relevant information from noisy data to predict the behavior of dynamic systems, effectively disregarding non-informative, temporally-correlated noise. In contrast, existing visual reinforcement learning algorithms face challenges in generating noise-free predictions within high-dimensional, noise-saturated environments, especially when trained on world models featuring realistic background noise extracted from natural video streams. We propose Task Relevant Mask Sampling (TRMS), a novel approach for identifying task-specific and reward-relevant masks. TRMS utilizes existing segmentation models as a masking prior, which is subsequently followed by a mask selector that dynamically identifies subset of masks at each timestep, selecting those most probable to contribute to task-specific rewards. To mitigate the high computational cost associated with these masking priors, a lightweight student network is trained in parallel. This network learns to perform masking independently and replaces the Segment Anything Model (SAM)-based teacher network after a brief initial phase (<10-25% of total training). TRMS enhances the generalization capabilities of Soft Actor-Critic agents under distractions, achieves better performance on the RL-Vigen benchmark, which includes challenging variants of the DeepMind Control Suite, Dexterous Manipulation and Quadruped Locomotion tasks.
Original languageEnglish
Article number4857
Number of pages25
JournalTransactions on machine learning research
Volume2025-September
Publication statusPublished - 18 Sept 2025

Bibliographical note

Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.

Cite this