|
The training is still in progress, but preliminary results indicate that the fine-tuned VLM policy is starting to learn to perform the basic manipulation tasks with increasing proficiency. Utilizing the parallel environments, data collection is quick and efficient, allowing for rapid iteration and improvement of the policy. After training is complete, we use a Franka Panda arm to deploy the policy in a real-world setting. To do this, we have to set up communication bwtween the local machine connected to the arm and the remote machine running the VLM inference:
Reward curve over time
Loss curve over time
|