图标描述 LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence

1The University of Hong Kong   2The Chinese University of Hong Kong   3University of Central Florida

Overall Introduction

Video Demo of Crafting an Enchanted Diamond Sword

     In this video, we show how LARM crafts an enchanted diamond sword in Minecraft with no initial inventory. To save time, we mainly present important segments for crafting the enchanted diamond sword.

Abstract

     Recent embodied agents are primarily built based on reinforcement learning (RL) or large language models (LLMs). Among them, RL agents are efficient for deployment but only perform very few tasks. By contrast, giant LLM agents (often more than 1000B parameters) present strong generalization while demanding enormous computing resources. In this work, we combine their advantages while avoiding the drawbacks by conducting the proposed referee RL on our developed large auto-regressive model (LARM). Specifically, LARM is built upon a lightweight LLM (fewer than 5B parameters) and directly outputs the next action to execute rather than text. We mathematically reveal that classic RL feedbacks vanish in long-horizon embodied exploration and introduce a giant LLM based referee to handle this reward vanishment during training LARM. In this way, LARM learns to complete diverse open-world tasks without human intervention. Especially, LARM successfully harvests enchanted diamond equipment in Minecraft, which demands significantly longer decision-making chains than the highest achievements of prior best methods.

Pipeline

     The overall pipeline of our method. As illustrated, we parametrize the actor and critic using a single LARM model with two separate prediction heads, the action head and critic head. We train LARM based on our proposed referee RL algorithm, which utilizes both environment feedback and referee generated auxiliary reward to guide the optimization of LARM.

pipeline

More Capability Examples

     More capability examples of LARM, which include traveling a long distance to find a village, building a nether portal and entering the nether, multiple agents collaboration to combat zombies.

pipeline

BibTeX

@article{li2024larm,
      title={LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence},
      author={Zhuoling, Li and Xiaogang, Xu and Zhenhua, Xu and SerNam, Lim and Hengshuang, Zhao},
      journal={arXiv preprint arXiv:2405.17424},
      year={2024}
    }