Learning Purely Tactile In-Hand Manipulation with a Torque-Controlled Hand

This site complements our ICRA 2022 paper Learning Purely Tactile In-Hand Manipulation with a Torque-Controlled Hand by Leon Sievers*, Johannes Pitz* and Berthold Bäuml.

Abstract

We show that a purely tactile dextrous in-hand manipulation task with continuous regrasping, requiring permanent force closure, can be learned from scratch and executed robustly on a torque-controlled humanoid robotic hand. The task is rotating a cube without dropping it, but in contrast to OpenAI’s seminal cube manipulation task, the palm faces downwards and no cameras but only the hand’s position and torque sensing are used. Although the task seems simple, it combines for the first time all the challenges in execution as well as learning that are important for using in-hand manipulation in real-world applications. We efficiently train in a precisely modeled and identified rigid body simulation with off-policy deep reinforcement learning, significantly sped up by a domain adapted curriculum, leading to a moderate 600 CPU hours of training time. The resulting policy is robustly transferred to the real humanoid DLR Hand-II, e.g., reaching more than 46 full 2π rotations of the cube in a single run and allowing for disturbances like different cube sizes, hand orientation, or pulling a finger.

Cite this paper as:

@inproceedings{Sievers2022,
    author = {Leon Sievers and Johannes Pitz and Berthold B{\"a}uml},
    booktitle = {Proc. IEEE International Conference on Robotics and Automation},
    title = {Learning Purely Tactile In-Hand Manipulation with a Torque-Controlled Hand},
    year = {2022}}

Hand

We use the DLR Hand II.

Hand

  \(q_1\) \(q_2\) \(q_3\)
\(\Theta_{\text{min}}\) \(-30^{\circ}\) \(-20^{\circ}\) \(-10^{\circ}\)
\(\Theta_{\text{max}}\) \(30^{\circ}\) \(86^{\circ}\) \(105^{\circ}\)
\(\Theta_{\text{lower}}\) \(-30^{\circ}\) \(-10^{\circ}\) \(-5^{\circ}\)
\(\Theta_{\text{upper}}\) \(30^{\circ}\) \(30^{\circ}\) \(30^{\circ}\)
\(K_\text{P}\) \(3.69 \frac{\text{Nm}}{\text{rad}}\) \(3.69 \frac{\text{Nm}}{\text{rad}}\) \(3.69 \frac{\text{Nm}}{\text{rad}}\)
\(K_\text{D}\) \(0.25 \frac{\text{Nm s}}{\text{rad}}\) \(0.25 \frac{\text{Nm s}}{\text{rad}}\) \(0.25 \frac{\text{Nm s}}{\text{rad}}\)
\(K_\text{e}\) \(25 \frac{\text{Nm}}{\text{rad}}\) \(25 \frac{\text{Nm}}{\text{rad}}\) \(25 \frac{\text{Nm}}{\text{rad}}\)
\(F_\text{static}\) \(0.02\text{Nm}\) \(0.02\text{Nm}\) \(0.02\text{Nm}\)

Hand dimensions as well as the angles and their limits. \(\Theta_{\text{min}}\) and \(\Theta_{\text{max}}\) denote the physical limits of the real kinematics while \(\Theta_{\text{lower}}\) and \(\Theta_{\text{upper}}\) are used for the simulated hand as well as the scaling of the network output. The PD Impedance Controller is parameterized by \(K_{\text{P}}\) and \(K_{\text{D}}\). The parasitic stiffness is modeled by \(K_{\text{e}}\). \(F_{\text{static}}\) denotes the static friction force.


Simulator

We are using PyBullet as our rigid body simulator.

Simulator Configuration

Initial Learning

earth_acceleration 9.81 \(\frac{\text{m}}{\text{s}^2}\)
simulation_frequency 500 \(\text{Hz}\)
network_frequency 10 \(\text{Hz}\)
steps_before_gravity_onset 4
friction_lateral_tip 0.9
friction_lateral_others 0.09
friction_lateral_tip_random 0.0
friction_spinning 0.1
cube_mass 0.05 \(\text{kg}\)
cube_length 0.08 \(\text{m}\)
cube_mass_random 0.005 \(\text{kg}\)
cube_position_random [0.001, 0.001, 0.005] \(\text{m}\)
cube_rotation_random [0.01, 0.01, 0.1] \(\text{rad}\)
motor_noise_std 0.002 \(\text{rad}\)
sticky_action_probability 0.1

Continue Learning

friction_lateral_tip_random 0.1
friction_spinning 0.1
motor_noise_std 0.02 \(\text{rad}\)

Learning

We use the Soft Actor-Critic Alogirthm.

Reward Constants

\(\lambda_{\text{spin}}\) 16
\(\lambda_{\text{Z}}\) 100
\(\lambda_{\text{plane}}\) 300
\(\lambda_{\text{rot}}\) 50
\(\lambda_{\text{proxy}}\) 25

Learning Configuration

Initial Learning

num_worker 12
local_steps_per_update 25
gradient_steps_per_update 100
batch_size 256
initial_no_update 2000
initial_random_steps 10000
curriculum_start 2e6
curriculum_length 4e6
num_layers 2
hidden_size 512
replay_size 1.5e5
entropy_target -6
alpha_initial 0.2
gamma 0.9
lambda_weight_decay 2.5e-4
max_episode_length 100
num_observation_stack 5
max_reward_rotation 1 \(\frac{\text{rad}}{\text{s}}\)

Continue Learning

replay_size 1.5e6
entropy_target -8
gamma 0.92
lambda_weight_decay 2.5e-4
max_episode_length 200
num_observation_stack 5
max_reward_rotation 0.7 \(\frac{\text{rad}}{\text{s}}\)