Robotics

Having a machine learning agent interact with its environment requires true unsupervised learning, skill acquisition, active learning, exploration and reinforcement, all ingredients of human learning that are still not well understood or exploited through the supervised approaches that dominate deep learning today. Our goal is to improve robotics via machine learning, and improve machine learning via robotics. We foster close collaborations between machine learning researchers and roboticists to enable learning at scale on real and simulated robotic systems.

Recent Publications

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Jesse Zhang
Jiahui Zhang
Karl Pertsch
Ziyi Liu
Xiang Ren
Shao-Hua Sun
Joseph Lim
Conference on Robot Learning 2023 (2023)
Preview abstract We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by autonomously growing a learned skill library. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing “skill bootstrapping,” where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments View details
Preview abstract We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching. View details
A Connection between Actor Regularization and Critic Regularization in Reinforcement Learning
Benjamin Eysenbach
Matthieu Geist
Ruslan Salakhutdinov
Sergey Levine
International Conference on Machine Learning (ICML) (2023)
Preview abstract As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting, with most methods regularizing either the actor or the critic. These methods appear distinct. Actor regularization (e.g., behavioral cloning penalties) is simpler and has appealing convergence properties, while critic regularization typically requires significantly more compute because it involves solving a game, but it has appealing lower-bound guarantees. Empirically, prior work alternates between claiming better results with actor regularization and critic regularization. In this paper, we show that these two regularization techniques can be equivalent under some assumptions: regularizing the critic using a CQL-like objective is equivalent to updating the actor with a BC- like regularizer and with a SARSA Q-value (i.e., “1-step RL”). Our experiments show that this theoretical model makes accurate, testable predictions about the performance of CQL and one-step RL. While our results do not definitively say whether users should prefer actor regularization or critic regularization, our results hint that actor regularization methods may be a simpler way to achieve the desirable properties of critic regularization. The results also suggest that the empirically- demonstrated benefits of both types of regularization might be more a function of implementation details rather than objective superiority. View details
Preview abstract In recent years, much progress has been made in learning robotic manipulation policies that can follow natural language instructions. Common approaches involve learning methods that operate on offline datasets, such as task-specific teleoperated demonstrations or on hindsight labeled robotic experience. Such methods work reasonably but rely strongly on the assumption of clean data: teleoperated demonstrations are collected with specific tasks in mind, while hindsight language descriptions rely on expensive human labeling. Recently, large-scale pretrained language and vision-language models like CLIP have been applied to robotics in the form of learning representations and planners. However, can these pretrained models also be used to cheaply impart internet-scale knowledge onto offline datasets, providing access to skills contained in the offline dataset that weren't necessarily reflected in ground truth labels? We investigate fine-tuning a reward model on a small dataset of robot interactions with crowd-sourced natural language labels and using the model to relabel instructions of a large offline robot dataset. The resulting dataset with diverse language skills is used to train imitation learning policies, which outperform prior methods by up to 30% when evaluated on a diverse set of novel language instructions that were not contained in the original dataset. View details
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents
Jeongeun Park
Seungwon Lim
Joonhyung Lee
Sangbeom Park
Sungjoon Choi
Youngjae Yu
IEEE Robotics and Automation Letters (2023) (to appear)
Preview abstract In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware few-shot prompting in a zero-shot manner. For ambiguous commands, we further disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. To evaluate the proposed system, we present a dataset consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Furthermore, we demonstrate the approach in a real-world human-robot interaction environment, i.e., handover scenarios. View details
Mechanical Search on Shelves with Efficient Stacking and Destacking of Objects
Huang Huang
Letian Fu
Michael Danielczuk
Chung Min Kim
Zachary Tam
Jeff Ichnowski
Brian Ichter
Ken Goldberg
The International Symposium of Robotics Research (ISRR) (2023)
Preview abstract Stacking increases storage efficiency in shelves, but the lack of visibility and accessibility makes the mechanical search problem of revealing and extracting target objects difficult for robots. In this paper, we extend the lateral-access mechanical search problem to shelves with stacked items and introduce two novel policies -- Distribution Area Reduction for Stacked Scenes (DARSS) and Monte Carlo Tree Search for Stacked Scenes (MCTSSS) -- that use destacking and restacking actions. MCTSSS improves on prior lookahead policies by considering future states after each potential action. Experiments in 1200 simulated and 18 physical trials with a Fetch robot equipped with a blade and suction cup suggest that destacking and restacking actions can reveal the target object with 82--100% success in simulation and 66--100% in physical experiments, and are critical for searching densely packed shelves. In the simulation experiments, both policies outperform a baseline and achieve similar success rates but take more steps compared with an oracle policy that has full state information. In simulation and physical experiments, DARSS outperforms MCTSSS in median number of steps to reveal the target, but MCTSSS has a higher success rate in physical experiments, suggesting robustness to perception noise. View details

Some of our teams