Skip to content

jonyzhang2023/awesome-embodied-vla-va-vln

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 

Repository files navigation

awesome-embodied-vla/va/vln Awesome

Survey

  • [2024] A Survey on Vision-Language-Action Models for Embodied AI [paper]
  • [2024] A Survey of Embodied Learning for Object-Centric Robotic Manipulation [paper]
  • [2024] Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [paper]

Vision Language Action (VLA) Models

2025

  • [2025] EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation [paper]
  • [2025] Shake-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Manipulations and Liquid Mixing [paper]
  • [2025] Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding [paper]
  • [2025] FAST: Efficient Action Tokenization for Vision-Language-Action Models [paper]
  • [2025] GeoManip: Geometric Constraints as General Interfaces for Robot Manipulation [paper]
  • [2025] Universal Actions for Enhanced Embodied Foundation Models [paper]
  • [2025] SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model [paper]
  • [2025] RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation [paper]
  • [2025] SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation [paper]
  • [2025] Improving Vision-Language-Action Model with Online Reinforcement Learning [paper]
  • [2025] Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation [paper]

2024

  • [2024] π0: A Vision-Language-Action Flow Model for General Robot Control [paper]
  • [2024] RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [paper]
  • [2024] OpenVLA: An Open-Source Vision-Language-Action Model [paper]
  • [2024] Octo: An Open-Source Generalist Robot Policy [paper]
  • [2024] Open X-Embodiment: Robotic Learning Datasets and RT-X Models [paper]
  • [2024] RT-H: Action Hierarchies Using Language [paper]
  • [2024] Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models [paper]
  • [2024] Open X-Embodiment: Robotic Learning Datasets and RT-X Models [paper]
  • [2024] Baku: An Efficient Transformer for Multi-Task Policy Learning [paper]
  • [2024] Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals [paper]
  • [2024] TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation [paper]
  • [2024] Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression [paper]
  • [2024] CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation [paper]
  • [2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model [paper]
  • [2024] Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations [paper]
  • [2024] An Embodied Generalist Agent in 3D World [paper]
  • [2024] RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation [paper]
  • [2024] SpatialBot: Precise Spatial Understanding with Vision Language Models [paper]
  • [2024] Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection [paper]
  • [2024] HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers [paper]
  • [2024] LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [paper]
  • [2024] RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation [paper]
  • [2024] Robotic Control via Embodied Chain-of-Thought Reasoning [paper]
  • [2024] GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation [paper]
  • [2024] Latent Action Pretraining from Videos [paper]
  • [2024] DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [paper]
  • [2024] RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation [paper]
  • [2024] Moto: Latent Motion Token as the Bridging Language for Robot Manipulation [paper]
  • [2024] TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies [paper]
  • [2024] Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments [paper]

2023

  • [2023] RT-1: Robotics Transformer for Real-World Control at Scale [paper]
  • [2023] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [paper]
  • [2023] PaLM-E: An Embodied Multimodal Language Model: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [paper]
  • [2023] Vision-Language Foundation Models as Effective Robot Imitators [paper]
  • [2023] Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation [paper]

Vision Language Navigation (VLN) Models

2025

  • [2025] Semantic Mapping in Indoor Embodied AI – A Comprehensive Survey and Future Directions [paper]

2024

  • [2024] Navid: Video-based vlm plans the next step for vision-andlanguage navigation [paper]
  • [2024] NaVILA: Legged Robot Vision-Language-Action Model for Navigation [paper]
  • [2024] The One RING: a Robotic Indoor Navigation Generalist [paper]

Vision Action (VA) Models

2025

  • [2025] Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics [paper]
  • [2025] You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations [paper]

2024

  • [2024] Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching [paper]

  • [2024] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations [paper]

  • [2024] Sparse diffusion policy: A sparse, reusable, and flexible policy for robot learning [paper]

  • [2024] ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation [paper]

  • [2024] 3d diffuser actor: Policy diffusion with 3d scene representations [paper]

  • [2024] Diffusion Policy Policy Optimization [paper]

  • [2024] Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation [paper]

  • [2024] EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning [paper]

  • [2024] Equivariant Diffusion Policy [paper]

  • [2024] Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models [paper]

  • [2024] Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies [paper]

  • [2024] Motion Before Action: Diffusing Object Motion as Manipulation Condition [paper]

  • [2024] One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [paper]

  • [2024] Consistency policy: Accelerated visuomotor policies via consistency distillation [paper]

  • [2024] SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation [paper]

  • [2024] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins [paper]

  • [2024] Few-Shot Task Learning through Inverse Generative Modeling [paper]

  • [2024] G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation [paper]

  • [2024] Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation [paper]

  • [2024] Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies [paper]

  • [2024] Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies [paper]

  • [2024] Equivariant diffusion policy [paper]

  • [2024] Scaling diffusion policy in transformer to 1 billion parameters for robotic manipulation [paper]

  • [2024] Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation [paper]

  • [2024] Equivariant diffusion policy [paper]

  • [2024] Learning universal policies via text-guided video generation [paper]

  • [2024] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning [paper]

  • [2024] 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations [paper]

  • [2024] Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation [paper]

  • [2024] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy [paper]

  • [2024] Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation [paper]

  • [2024] Prediction with Action: Visual Policy Learning via Joint Denoising Process [paper]

  • [2024] Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations [paper]

  • [2024] Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [paper]

  • [2024] Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models [paper]

2023

  • [2023] Diffusion policy: Visuomotor policy learning via action diffusion [paper]

Ralated Works

  • Awesome-Generalist-Agents [repo]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published