IROS 2024

Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects

This site complements our paper Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects by Johannes Pitz*, Lennart Röstel*, Leon Sievers, Darius Burschka and Berthold Bäuml.

Abstract

Reorienting diverse objects with a multi-fingered hand is a challenging task. Current methods in robotic in-hand manipulation are either object-specific or require permanent supervision of the object state from visual sensors. This is far from human capabilities and from what is needed in real-world applications. In this work, we address this gap by training shape-conditioned agents to reorient diverse objects in hand, relying purely on tactile feedback (via torque and position measurements of the fingers’ joints). To achieve this, we propose a learning framework that exploits shape information in a reinforcement learning policy and a learned state estimator. We find that representing 3D shapes by vectors from a fixed set of basis points to the shape’s surface, transformed by its predicted 3D pose, is especially helpful for learning dexterous in-hand manipulation. In simulation and real-world experiments, we show the reorientation of many objects with high success rates, on par with state-of-the-art results obtained with specialized single-object agents. Moreover, we show generalization to novel objects, achieving success rates of ∼90% even for non-convex shapes.

Cite this paper as:

@inproceedings{Pitz2024,
    title={Learning a Shape-Conditioned Agent for Purely Tactile In-Hand Manipulation of Various Objects},
    booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    author={Pitz, Johannes and Röstel, Lennart and Sievers, Leon and Burschka, Darius and Bäuml, Berthold},
    year={2024}
}

Hyperparameters

Below, we provide the hyperparameters used for training the purely tactile agent.

EcRL Training Parameters

See Estimator-Coupled Reinforcement Learning for Robust Purely Tactile In-Hand Manipulation for definitions

\(\rho_0\)	\(1.0\)
\(\delta_{\rho}\)	\(5\times 10^{-4}\)
rollout length \(T\)	\(32\)
data reusage (\(k\))	\(4\)

Estimator Parameters

learning rate	\(5\times 10^{-4}\)
Adam \(\beta_1, \beta_2\)	\(0.9\), \(0.99\)
weight decay	\(1\times 10^{-5}\)
hidden layers \(f_{\varphi}\)	[512, 512, 512, 512]
hidden layers \(f_{\sigma}\)	[256, 256, 256, 256]
minibatch size	\(2^{10}\)
latent dimensions \(n\)	\(32\)
clip gradient by norm	\(1.0\)

Policy Training Parameters

We train a policy and a value funtion using PPO with the following parameters:

learning rate	adaptive, based on kl-divergence
kl_threshold	0.016
weight decay	\(1\times 10^{-5}\)
hidden layers	[512, 512, 256, 128]
minibatch size	\(2^{15}\)
\(\epsilon_{clip}\)	\(0.2\)
entropy coeff	\(1\times 10^{-3}\)
GAE \(\lambda\)	\(0.95\)
\(\gamma\)	\(0.99\)

Network details

Policy

Network(
  (a2c_network): Network(
    (actor_mlp): D2RLNet(
      (activations): ModuleList(
        (0-3): 4 x ELU(alpha=1.0)
      )
      (linears): ModuleList(
        (0): Linear(in_features=353, out_features=512, bias=True)
        (1): Linear(in_features=865, out_features=512, bias=True)
        (2): Linear(in_features=865, out_features=256, bias=True)
        (3): Linear(in_features=609, out_features=128, bias=True)
      )
    )
    (critic_mlp): D2RLNet(
      (activations): ModuleList(
        (0-3): 4 x ELU(alpha=1.0)
      )
      (linears): ModuleList(
        (0): Linear(in_features=353, out_features=512, bias=True)
        (1): Linear(in_features=865, out_features=512, bias=True)
        (2): Linear(in_features=865, out_features=256, bias=True)
        (3): Linear(in_features=609, out_features=128, bias=True)
      )
    )
    (value): Linear(in_features=128, out_features=1, bias=True)
    (mu): Linear(in_features=128, out_features=12, bias=True)
  )
)

Inputs

joint pos + control error: 6 * 24
goal delta rot: 6
vec bps: 4**3 * 3
object pose: 9
uncertainty: 2

Estimator

Network(
  (encoder): MLP(
    (model): Sequential(
      (0): Linear(in_features=144, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
      (4): Linear(in_features=512, out_features=512, bias=True)
      (5): ReLU()
      (6): Linear(in_features=512, out_features=512, bias=True)
      (7): ReLU()
      (8): Linear(in_features=512, out_features=32, bias=True)
    )
  )
  (transition_model): MLP(
    (model): Sequential(
      (0): Linear(in_features=265, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=512, bias=True)
      (3): ReLU()
      (4): Linear(in_features=512, out_features=512, bias=True)
      (5): ReLU()
      (6): Linear(in_features=512, out_features=512, bias=True)
      (7): ReLU()
      (8): Linear(in_features=512, out_features=38, bias=True)
    )
  )
  (sigma_net): MLP(
    (model): Sequential(
      (0): Linear(in_features=32, out_features=256, bias=True)
      (1): ReLU()
      (2): Linear(in_features=256, out_features=256, bias=True)
      (3): ReLU()
      (4): Linear(in_features=256, out_features=256, bias=True)
      (5): ReLU()
      (6): Linear(in_features=256, out_features=256, bias=True)
      (7): ReLU()
      (8): Linear(in_features=256, out_features=2, bias=True)
    )
  )
)

Encoder input

joint pos + control error: 6 * 24

Transition_model input

encoder out: 32
latent: 32
vec bps: 4**3 * 3
object pose: 9

Transition_model output

latent: 32
delta pos: 3
delta rot: 3

Sigma_net input

latent: 32