PIGINet: A Transformer-based Plan Feasibility Predictor for Robotic Rearrangement in Geometrically Complex Environments

RSS 2023

1MIT, 2NVIDIA

PIGINet predicts the feasibility of a task Plan given Images of objects, Goal description, and Initial state descriptions.
It reduces the planning time of a task and motion planner by 50-80% by eliminating infeasible task plans.

Input and output of PIGINet plan feasibility predictor

Motivations

Task planners or LLMs are good at generating high-level plans, but pick-and-place actions may be infeasible when there are articulated or movable obstacles.

PIGINet uses images to decide which sequence of actions are feasible in rearrangement tasks involving storage space and cluttered surface.

PIGINet is a Transformer-based architecture that fuses features of images, text, and values describing the problem and plan.

Architecture of PIGINet plan feasibility predictor

Abstract

Image views We present a learning-enabled Task and Motion Planning (TAMP) algorithm for solving mobile manipulation problems in environments with many articulated and movable obstacles. Our idea is to bias the search procedure of a traditional TAMP planner with a learned plan feasibility predictor.

The core of our algorithm is PIGINet, a novel Transformer-based learning method that takes in a task plan, the goal, and the initial state, and predicts the probability of finding motion trajectories associated with the task plan. The elements of each action or relation in the initial state–such as text, object poses, and door joint angles– are processed to produce embeddings of the same dimension and fused together to produce each token in the input sequence to transformer encoder. A pre-trained CLIP model is used to generate corresponding text and image embeddings.

We integrate PIGINet within a TAMP planner that generates a diverse set of high-level task plans, sorts them by their predicted likelihood of feasibility, and refines them in that order. We evaluate the runtime of our TAMP algorithm on seven families of kitchen rearrangement problems, comparing its performance to that of non-learning baselines. Our experiments show that PIGINet substantially improves planning efficiency, cutting down runtime by 80% on problems with small state spaces and 10%-50% on larger ones, after being trained on only 150-600 problems. Finally, it also achieves zero-shot generalization to problems with unseen object categories thanks to its visual encoding of objects.



Video

Results

Choose a problem:



Note:
  • In the video, furnitures that block the top-down view of the robot arm are removed (upper panel), while the side view shows all furnitures (lower panel).
  • In each table, the cell with green background indicates the task plan successfully refined and visualized in the video. PIGINet speeds up planning by reducing the task plans attempted before refining that task plan.
  • In the tables, the arguments to actions are object instances, e.g. tomato, tomato#1.
  • Move the cursor to the object names in bold, the corresponding object instance will be displayed. Note that those segmented images are rendered in full visibility with transparent background for better understanding; they are not the segmented images used by PIGINet.
  • Not all movable objects or articulated doors in the scene are included as manipulable objects in the planning problem.

3D interactive demo

BibTeX

@INPROCEEDINGS{yang2023piginet, 
    AUTHOR    = {Zhutian  Yang AND Caelan R Garrett AND Tomas Lozano-Perez AND Leslie Kaelbling AND Dieter Fox}, 
    TITLE     = {{Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2023}, 
    ADDRESS   = {Daegu, Republic of Korea}, 
    MONTH     = {July}, 
    DOI       = {10.15607/RSS.2023.XIX.061} 
}