IROS 2024
Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects
This site complements our paper Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects by Johannes Pitz*, Lennart Röstel*, Leon Sievers, Darius Burschka and Berthold Bäuml.
Abstract
Reorienting diverse objects with a multi-fingered hand is a challenging task. Current methods in robotic in-hand manipulation are either object-specific or require permanent supervision of the object state from visual sensors. This is far from human capabilities and from what is needed in real-world applications. In this work, we address this gap by training shape-conditioned agents to reorient diverse objects in hand, relying purely on tactile feedback (via torque and position measurements of the fingers’ joints). To achieve this, we propose a learning framework that exploits shape information in a reinforcement learning policy and a learned state estimator. We find that representing 3D shapes by vectors from a fixed set of basis points to the shape’s surface, transformed by its predicted 3D pose, is especially helpful for learning dexterous in-hand manipulation. In simulation and real-world experiments, we show the reorientation of many objects with high success rates, on par with state-of-the-art results obtained with specialized single-object agents. Moreover, we show generalization to novel objects, achieving success rates of ∼90% even for non-convex shapes.
Cite this paper as:
@inproceedings{Pitz2024,
title={Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects},
booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
author={Pitz, Johannes and Röstel, Lennart and Sievers, Leon and Burschka, Darius and Bäuml, Berthold},
year={2024}
}
Hyperparameters
Below, we provide the hyperparameters used for training the purely tactile agent.
EcRL Training Parameters
See Estimator-Coupled Reinforcement Learning for Robust Purely Tactile In-Hand Manipulation for definitions
\(\rho_0\) | \(1.0\) |
\(\delta_{\rho}\) | \(5\times 10^{-4}\) |
rollout length \(T\) | \(32\) |
data reusage (\(k\)) | \(4\) |
Estimator Parameters
learning rate | \(5\times 10^{-4}\) |
Adam \(\beta_1, \beta_2\) | \(0.9\), \(0.99\) |
weight decay | \(1\times 10^{-5}\) |
hidden layers \(f_{\varphi}\) | [512, 512, 512, 512] |
hidden layers \(f_{\sigma}\) | [256, 256, 256, 256] |
minibatch size | \(2^{10}\) |
latent dimensions \(n\) | \(32\) |
clip gradient by norm | \(1.0\) |
Policy Training Parameters
We train a policy and a value funtion using PPO with the following parameters:
learning rate | adaptive, based on kl-divergence |
kl_threshold | 0.016 |
weight decay | \(1\times 10^{-5}\) |
hidden layers | [512, 512, 256, 128] |
minibatch size | \(2^{15}\) |
\(\epsilon_{clip}\) | \(0.2\) |
entropy coeff | \(1\times 10^{-3}\) |
GAE \(\lambda\) | \(0.95\) |
\(\gamma\) | \(0.99\) |
Network details
Policy
Network(
(a2c_network): Network(
(actor_mlp): D2RLNet(
(activations): ModuleList(
(0-3): 4 x ELU(alpha=1.0)
)
(linears): ModuleList(
(0): Linear(in_features=353, out_features=512, bias=True)
(1): Linear(in_features=865, out_features=512, bias=True)
(2): Linear(in_features=865, out_features=256, bias=True)
(3): Linear(in_features=609, out_features=128, bias=True)
)
)
(critic_mlp): D2RLNet(
(activations): ModuleList(
(0-3): 4 x ELU(alpha=1.0)
)
(linears): ModuleList(
(0): Linear(in_features=353, out_features=512, bias=True)
(1): Linear(in_features=865, out_features=512, bias=True)
(2): Linear(in_features=865, out_features=256, bias=True)
(3): Linear(in_features=609, out_features=128, bias=True)
)
)
(value): Linear(in_features=128, out_features=1, bias=True)
(mu): Linear(in_features=128, out_features=12, bias=True)
)
)
Inputs
joint pos + control error: 6 * 24goal delta rot: 6
vec bps: 4**3 * 3
object pose: 9
uncertainty: 2
Estimator
Network(
(encoder): MLP(
(model): Sequential(
(0): Linear(in_features=144, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=512, bias=True)
(5): ReLU()
(6): Linear(in_features=512, out_features=512, bias=True)
(7): ReLU()
(8): Linear(in_features=512, out_features=32, bias=True)
)
)
(transition_model): MLP(
(model): Sequential(
(0): Linear(in_features=265, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=512, bias=True)
(5): ReLU()
(6): Linear(in_features=512, out_features=512, bias=True)
(7): ReLU()
(8): Linear(in_features=512, out_features=38, bias=True)
)
)
(sigma_net): MLP(
(model): Sequential(
(0): Linear(in_features=32, out_features=256, bias=True)
(1): ReLU()
(2): Linear(in_features=256, out_features=256, bias=True)
(3): ReLU()
(4): Linear(in_features=256, out_features=256, bias=True)
(5): ReLU()
(6): Linear(in_features=256, out_features=256, bias=True)
(7): ReLU()
(8): Linear(in_features=256, out_features=2, bias=True)
)
)
)
Encoder input
joint pos + control error: 6 * 24Transition_model input
encoder out: 32latent: 32
vec bps: 4**3 * 3
object pose: 9
Transition_model output
latent: 32delta pos: 3
delta rot: 3