rlt_lift_finetuned_pi05

Private backup of the RLT fine-tuning artifacts for the MIML-VLA Lift task.

Base model

This RLT checkpoint was initialized from the behavior-cloning fine-tuned Pi0.5 Lift model:

Base HF model repo: 0xemkey/finetuned_with_lift_pi05
Base local checkpoint path during run: /data/ubuntu/checkpoints/finetuned_with_lift_pi05/step_000100

This repository does not store the full 16GB Pi0.5 base model again. It stores the RLT adapter/checkpoint artifacts, logs, plots, and run metadata.

Run

Run ID: 20260529_204704
Environment: IsaacLab / MIML-VLA Lift
Policy server: LeRobot Pi0.5 websocket server
Online RL script: train_rlt_online_miml_lift.py
Source checkpoint: /data/ubuntu/checkpoints/finetuned_with_lift_pi05/step_000100

Summary

Episodes: 5
Environment steps: 202
Online updates: 300
Success rate: 0.0000
Gripper open ratio: 1.0000
Gripper closed ratio: 0.0000
Minimum distance observed: about 0.005462 m

Main diagnosis

The RLT infrastructure worked: rollout, updates, checkpoints, CSV logs, and plots were produced.

However, task success was not achieved. The main blocker was gripper command mapping/override:

Some raw/denormalized actions indicated closing behavior.
The environment still received +1.0, interpreted as open.
Therefore, the gripper never closed during the run.
As a result, contact, fixed-joint attachment, and lift rewards remained inactive.

Expected convention:

denorm_gripper < 0  -> env_gripper = -1.0 closed
denorm_gripper > 0  -> env_gripper = +1.0 open

Included files

checkpoints/

rlt_final_debug.pt
rlt_episode_0005.pt if available

logs/

online_rl_step_metrics.csv
online_rl_episode_metrics.csv
online_rl_update_metrics.csv
plot_summary.txt

plots/

online_rl_dashboard.png
online_rl_reward_curve.png
online_rl_distance_height_curve.png
online_rl_residual_curve.png
online_rl_alignment_curve.png
online_rl_loss_curve.png

Next step

Before running longer RLT training, fix gripper mapping so negative gripper commands are sent to the environment as closed:

-1.0 = closed
+1.0 = open

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning