LAPP: Large Language Model Feedback for Preference-Driven Reinforcement Learning

Overview

We introduce Large Language Model-Assisted Preference Prediction (LAPP), a novel framework for robot learning that enables efficient, customizable, and expressive behavior acquisition with minimum human effort. Unlike prior approaches that rely heavily on reward engineering, human demonstrations, motion capture, or expensive pairwise preference labels, LAPP leverages large language models (LLMs) to automatically generate preference labels from raw state-action trajectories collected during reinforcement learning (RL). These labels are used to train an online preference predictor, which in turn guides the policy optimization process toward satisfying high-level behavioral specifications provided by humans. Our key technical contribution is the integration of LLMs into the RL feedback loop through trajectory-level preference prediction, enabling robots to acquire complex skills including subtle control over gait patterns and rhythmic timing. We evaluate LAPP on a diverse set of quadruped locomotion and dexterous manipulation tasks and show that it achieves efficient learning, higher final performance, faster adaptation, and precise control of high-level behaviors. Notably, LAPP enables robots to master highly dynamic and expressive tasks such as quadruped backflips, which remain out of reach for standard LLM-generated or handcrafted rewards. Our results highlight LAPP as a promising direction for scalable preference-driven robot learning.

Video (Click to YouTube)

Paper

Check out our paper linked here.

Codebase

Check out our codebase at https://github.com/generalroboticslab/LAPP for locomotion experiments, and https://github.com/generalroboticslab/LAPP_manipulation for maniulation experiments.

Citation

@article{
      jian2025lapp,
      title={LAPP: Large Language Model Feedback for Preference-Driven Reinforcement Learning},
      author={Pingcheng Jian and Xiao Wei and Yanbaihui Liu and Samuel A. Moore and Michael M. Zavlanos and Boyuan Chen},
      journal={Transactions on Machine Learning Research},
      issn={2835-8856},
      year={2025},
      url={https://openreview.net/forum?id=cq76wx7T9F},
      note={}
}

Acknowledgment

This work is supported by ARL STRONG program under awards W911NF2320182, W911NF2220113 and W911NF2420215, by DARPA FoundSci program under award HR00112490372, by DARPA TIAMAT HR00112490419, and by AFOSR under award FA9550-19-1-0169.

Copyright 2025. Duke University. This paper is hereby licensed publicly under the CC BY-NC-ND 4.0 License. Duke University has filed patent rights for the technology associated with this article. For further license rights, including using the patent rights for commercial purposes, please contact Duke’s Office for Translation and Commercialization (otcquestions@duke.edu) and reference OTC File 8724.

General Robotics Lab

LAPP:

Large Language Model Feedback for Preference-Driven Reinforcement Learning