rlt_lift_finetuned_pi05

Private backup of the RLT fine-tuning artifacts for the MIML-VLA Lift task.

Base model

This RLT checkpoint was initialized from the behavior-cloning fine-tuned Pi0.5 Lift model:

  • Base HF model repo: 0xemkey/finetuned_with_lift_pi05
  • Base local checkpoint path during run: /data/ubuntu/checkpoints/finetuned_with_lift_pi05/step_000100

This repository does not store the full 16GB Pi0.5 base model again. It stores the RLT adapter/checkpoint artifacts, logs, plots, and run metadata.

Run

  • Run ID: 20260529_204704
  • Environment: IsaacLab / MIML-VLA Lift
  • Policy server: LeRobot Pi0.5 websocket server
  • Online RL script: train_rlt_online_miml_lift.py
  • Source checkpoint: /data/ubuntu/checkpoints/finetuned_with_lift_pi05/step_000100

Summary

  • Episodes: 5
  • Environment steps: 202
  • Online updates: 300
  • Success rate: 0.0000
  • Gripper open ratio: 1.0000
  • Gripper closed ratio: 0.0000
  • Minimum distance observed: about 0.005462 m

Main diagnosis

The RLT infrastructure worked: rollout, updates, checkpoints, CSV logs, and plots were produced.

However, task success was not achieved. The main blocker was gripper command mapping/override:

  • Some raw/denormalized actions indicated closing behavior.
  • The environment still received +1.0, interpreted as open.
  • Therefore, the gripper never closed during the run.
  • As a result, contact, fixed-joint attachment, and lift rewards remained inactive.

Expected convention:

denorm_gripper < 0  -> env_gripper = -1.0 closed
denorm_gripper > 0  -> env_gripper = +1.0 open

Included files

checkpoints/

  • rlt_final_debug.pt
  • rlt_episode_0005.pt if available

logs/

  • online_rl_step_metrics.csv
  • online_rl_episode_metrics.csv
  • online_rl_update_metrics.csv
  • plot_summary.txt

plots/

  • online_rl_dashboard.png
  • online_rl_reward_curve.png
  • online_rl_distance_height_curve.png
  • online_rl_residual_curve.png
  • online_rl_alignment_curve.png
  • online_rl_loss_curve.png

Next step

Before running longer RLT training, fix gripper mapping so negative gripper commands are sent to the environment as closed:

-1.0 = closed
+1.0 = open
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading