Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

¹University of California, Berkeley ²Carnegie Mellon University
^*Indicates Equal Contribution
^†Indicates Project Lead
^‡Indicates Corresponding Author

CoRL 2024

Abstract

The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling task-specific learning without retraining the entire model. It not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulators and the real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications.

Task Transfer

We evaluate the reusability of the SDP, emphasizing its capability to leverage previously acquired experts and facilitate the learning of new skills by composing experts. Specifically, we pretrain the experts in Coffee and Mug Cleanup, freeze them, and only train the lightweight router in Coffee Preparation. With SDP, we find that experts learned from Coffee and Mug Cleanup can be reused in Coffee Preparation. It is particularly interesting to observe that whenever knowledge related to Coffee is required, but not seen in Mug Cleanup, the experts learned in Coffee are activated. Moreover, through compositing these experts, SDP can learn new skills such as moving mug to the coffee machine's drip tray. We slow down the video during these pivotal moments.

BibTeX

@article{wang2024sparse, title={Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning}, author={Wang, Yixiao and Zhang, Yifei and Huo, Mingxiao and Tian, Ran and Zhang, Xiang and Xie, Yichen and Xu, Chenfeng and Ji, Pengliang and Zhan, Wei and Ding, Mingyu and others}, journal={arXiv preprint arXiv:2407.01531}, year={2024} }

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

CoRL 2024 Poster.

Abstract

Real-Robot Tasks

Multitask - FANUC Robotic Arm.

Multitask - Comparison with TCD. Our SDP surpasses the baseline, Task-Conditioned Diffusion (TCD), in real-robot experiments. The baseline struggles to capture sparse and multimodal action sequence distributions.

Task Transfer

Task Transfer.

BibTeX