3D shape categories

Learning visual policies for building 3D shape categories

Manipulation and assembly tasks require non-trivial planning of actions depending on the environment and the final goal. Previous work in this domain often assembles particular instances of objects from known sets of primitives. In contrast, we here aim to handle varying sets of primitives and to construct different objects of the same shape category. Given a single object instance of a category, e.g. an arch, and a binary shape classifier, we learn a visual policy to assemble other instances of the same category. In particular, we propose a disassembly procedure and learn a state policy that discovers new object instances and their assembly plans in state space. We then render simulated states in the observation space and learn a heatmap representation to predict alternative actions from a given input image. To validate our approach, we first demonstrate its efficiency for building object categories in state space. We then show the success of our visual policies for building arches from different primitives. Moreover, we demonstrate (i) the reactive ability of our method to re-assemble objects using additional primitives and (ii) the robust performance of our policy for unseen primitives resembling building blocks used during training. Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.

Method overview. Given an example object and a shape classifier on the left, our method generates new objects with the same shape and discovers action sequences for building these objects in the state space. Using a large set of generated state-actions pairs, we render states as realistic observations. We also generate 2D heatmaps encoding source and target locations and orientations of one or several primitives. Heatmaps can represent multiple hypothesis for the next action when several identical primitives are used or multiple object instances can be assembled. Positions on our 2D heatmaps correspond to positions on the 2D surface of a table, hence, the identified local maxima on heatmaps can be used to control the robot. As the last step of our method, we train a Behaviour Cloning policy to predict pick and place heatmaps from observations. We train policies with HourGlass CNN and sim2real augmentation. The learned policy is directly transferred to a real robot which assembles 3D objects from primitives on the table.




Coming soon...