Could you please provide some guidance on how to perform RL training on VLMs? For example, how to train on WebShop datasets?