awesome-embodied-vla/va/vln

Survey

[2024] A Survey on Vision-Language-Action Models for Embodied AI [paper]
[2024] A Survey of Embodied Learning for Object-Centric Robotic Manipulation [paper]
[2024] Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [paper]

Vision Language Action (VLA) Models

2025

[2025] EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation [paper]
[2025] Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing [paper]
[2025] Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding [paper]
[2025] FAST: Efficient Action Tokenization for Vision-Language-Action Models [paper]
[2025] GeoManip: Geometric Constraints as General Interfaces for Robot Manipulation [paper]
[2025] Universal Actions for Enhanced Embodied Foundation Models [paper]
[2025] SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model [paper]
[2025] RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation [paper]
[2025] SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation [paper]
[2025] Improving Vision-Language-Action Model with Online Reinforcement Learning [paper]
[2025] Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation [paper]

2024

[2024] π0: A Vision-Language-Action Flow Model for General Robot Control [paper]
[2024] RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [paper]
[2024] OpenVLA: An Open-Source Vision-Language-Action Model [paper]
[2024] Octo: An Open-Source Generalist Robot Policy [paper]
[2024] Open X-Embodiment: Robotic Learning Datasets and RT-X Models [paper]
[2024] RT-H: Action Hierarchies Using Language [paper]
[2024] Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models [paper]
[2024] Open X-Embodiment: Robotic Learning Datasets and RT-X Models [paper]
[2024] Baku: An Efficient Transformer for Multi-Task Policy Learning [paper]
[2024] Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals [paper]
[2024] TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation [paper]
[2024] Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression [paper]
[2024] CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation [paper]
[2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model [paper]
[2024] Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations [paper]
[2024] An Embodied Generalist Agent in 3D World [paper]
[2024] RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation [paper]
[2024] SpatialBot: Precise Spatial Understanding with Vision Language Models [paper]
[2024] Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection [paper]
[2024] HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers [paper]
[2024] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [paper]
[2024] RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation [paper]
[2024] Robotic Control via Embodied Chain-of-Thought Reasoning [paper]
[2024] GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation [paper]
[2024] Latent Action Pretraining from Videos [paper]
[2024] DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [paper]
[2024] RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation [paper]
[2024] Moto: Latent Motion Token as the Bridging Language for Robot Manipulation [paper]
[2024] TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies [paper]
[2024] Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments [paper]

2023

[2023] RT-1: Robotics Transformer for Real-World Control at Scale [paper]
[2023] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [paper]
[2023] PaLM-E: An Embodied Multimodal Language Model: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [paper]
[2023] Vision-Language Foundation Models as Effective Robot Imitators [paper]
[2023] Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation [paper]

Vision Language Navigation (VLN) Models

2025

[2025] Semantic Mapping in Indoor Embodied AI – A Comprehensive Survey and Future Directions [paper]

2024

[2024] Navid: Video-based vlm plans the next step for vision-andlanguage navigation [paper]
[2024] NaVILA: Legged Robot Vision-Language-Action Model for Navigation [paper]
[2024] The One RING: a Robotic Indoor Navigation Generalist [paper]

Vision Action (VA) Models

2025

[2025] Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [paper]
[2025] You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations [paper]

2024

[2024] Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching [paper]
[2024] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations [paper]
[2024] Sparse diffusion policy: A sparse, reusable, and flexible policy for robot learning [paper]
[2024] ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation [paper]
[2024] 3d diffuser actor: Policy diffusion with 3d scene representations [paper]
[2024] Diffusion Policy Policy Optimization [paper]
[2024] Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation [paper]
[2024] EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning [paper]
[2024] Equivariant Diffusion Policy [paper]
[2024] Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models [paper]
[2024] Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies [paper]
[2024] Motion Before Action: Diffusing Object Motion as Manipulation Condition [paper]
[2024] One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [paper]
[2024] Consistency policy: Accelerated visuomotor policies via consistency distillation [paper]
[2024] SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation [paper]
[2024] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins [paper]
[2024] Few-Shot Task Learning through Inverse Generative Modeling [paper]
[2024] G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation [paper]
[2024] Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation [paper]
[2024] Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies [paper]
[2024] Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies [paper]
[2024] Equivariant diffusion policy [paper]
[2024] Scaling diffusion policy in transformer to 1 billion parameters for robotic manipulation [paper]
[2024] Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation [paper]
[2024] Equivariant diffusion policy [paper]
[2024] Learning universal policies via text-guided video generation [paper]
[2024] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning [paper]
[2024] 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations [paper]
[2024] Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation [paper]
[2024] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy [paper]
[2024] Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation [paper]
[2024] Prediction with Action: Visual Policy Learning via Joint Denoising Process [paper]
[2024] Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations [paper]
[2024] Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [paper]
[2024] Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models [paper]

2023

[2023] Diffusion policy: Visuomotor policy learning via action diffusion [paper]

Ralated Works

Awesome-Generalist-Agents [repo]

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-embodied-vla/va/vln

Survey

Vision Language Action (VLA) Models

2025

2024

2023

Vision Language Navigation (VLN) Models

2025

2024

Vision Action (VA) Models

2025

2024

2023

Ralated Works

About

Releases

Packages

Contributors 2

jonyzhang2023/awesome-embodied-vla-va-vln

Folders and files

Latest commit

History

Repository files navigation

awesome-embodied-vla/va/vln

Survey

Vision Language Action (VLA) Models

2025

2024

2023

Vision Language Navigation (VLN) Models

2025

2024

Vision Action (VA) Models

2025

2024

2023

Ralated Works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages