Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

1University of California, Berkeley 2Carnegie Mellon University
*Indicates Equal Contribution

Indicates Project Lead

Indicates Corresponding Author
Supplementary Code Checkpoints
Multitask - Simulation.
Sparsity of SDP. During the inference, only some of the experts (in orange and pink) are activated.

Abstract

The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling task-specific learning without retraining the entire model. It not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulators and the real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications.

Description of image

Overview of Sparse Diffusion Policy (SDP). 1) Multitask Learning: SDP can simultaneously acquire experts from different demonstration datasets, activate task-specific experts to complish diverse robot tasks. 2) Continual Learning: SDP can continuously learn new tasks by adding only a few new experts without catastrophic forgetting by retaining the old experts and routers. 3) Task Transfer: SDP can freeze the parameter-rich experts and rapidly transfer to new tasks by only tuning lightweight routers for expert selection. Furthermore, SDP can acquire new skills based on the previously learned knowledge.

Real-Robot Tasks

Multitask - FANUC Robotic Arm.

Multitask - Comparison with TCD. Our SDP surpasses the baseline, Task-Conditioned Diffusion (TCD), in real-robot experiments. The baseline struggles to capture sparse and multimodal action sequence distributions.

Task Transfer

We evaluate the reusability of the SDP, emphasizing its capability to leverage previously acquired experts and facilitate the learning of new skills by composing experts. Specifically, we pretrain the experts in Coffee and Mug Cleanup, freeze them, and only train the lightweight router in Coffee Preparation. With SDP, we find that experts learned from Coffee and Mug Cleanup can be reused in Coffee Preparation. It is particularly interesting to observe that whenever knowledge related to Coffee is required, but not seen in Mug Cleanup, the experts learned in Coffee are activated. Moreover, through compositing these experts, SDP can learn new skills such as moving mug to the coffee machine's drip tray. We slow down the video during these pivotal moments.

Task Transfer.

BibTeX

@article{wang2024sparse,
        title={Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning},
        author={Wang, Yixiao and Zhang, Yifei and Huo, Mingxiao and Tian, Ran and Zhang, Xiang and Xie, Yichen and Xu, Chenfeng and Ji, Pengliang and Zhan, Wei and Ding, Mingyu and others},
        journal={arXiv preprint arXiv:2407.01531},
        year={2024}
      }