Instructions to use 0xemkey/rlt_lift_finetuned_pi05 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use 0xemkey/rlt_lift_finetuned_pi05 with LeRobot:
- Notebooks
- Google Colab
- Kaggle
rlt_lift_finetuned_pi05
Private backup of the RLT fine-tuning artifacts for the MIML-VLA Lift task.
Base model
This RLT checkpoint was initialized from the behavior-cloning fine-tuned Pi0.5 Lift model:
- Base HF model repo:
0xemkey/finetuned_with_lift_pi05 - Base local checkpoint path during run:
/data/ubuntu/checkpoints/finetuned_with_lift_pi05/step_000100
This repository does not store the full 16GB Pi0.5 base model again. It stores the RLT adapter/checkpoint artifacts, logs, plots, and run metadata.
Run
- Run ID:
20260529_204704 - Environment: IsaacLab / MIML-VLA Lift
- Policy server: LeRobot Pi0.5 websocket server
- Online RL script:
train_rlt_online_miml_lift.py - Source checkpoint:
/data/ubuntu/checkpoints/finetuned_with_lift_pi05/step_000100
Summary
- Episodes: 5
- Environment steps: 202
- Online updates: 300
- Success rate: 0.0000
- Gripper open ratio: 1.0000
- Gripper closed ratio: 0.0000
- Minimum distance observed: about 0.005462 m
Main diagnosis
The RLT infrastructure worked: rollout, updates, checkpoints, CSV logs, and plots were produced.
However, task success was not achieved. The main blocker was gripper command mapping/override:
- Some raw/denormalized actions indicated closing behavior.
- The environment still received
+1.0, interpreted as open. - Therefore, the gripper never closed during the run.
- As a result, contact, fixed-joint attachment, and lift rewards remained inactive.
Expected convention:
denorm_gripper < 0 -> env_gripper = -1.0 closed
denorm_gripper > 0 -> env_gripper = +1.0 open
Included files
checkpoints/
rlt_final_debug.ptrlt_episode_0005.ptif available
logs/
online_rl_step_metrics.csvonline_rl_episode_metrics.csvonline_rl_update_metrics.csvplot_summary.txt
plots/
online_rl_dashboard.pngonline_rl_reward_curve.pngonline_rl_distance_height_curve.pngonline_rl_residual_curve.pngonline_rl_alignment_curve.pngonline_rl_loss_curve.png
Next step
Before running longer RLT training, fix gripper mapping so negative gripper commands are sent to the environment as closed:
-1.0 = closed
+1.0 = open