-
Dalian University of Technology
Highlights
- Pro
Starred repositories
DaD's a pretty good keypoint detector, probably the best.
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
[AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
LoRAT_pytracking: reproduction of [ECCV2024] LoRAT
EVE Series: Encoder-Free Vision-Language Models from BAAI
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
The official implementation for the CVPR 2023 paper Joint Visual Grounding and Tracking with Natural Language Specification.
[TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
[Embodied-AI-Survey-2024] Paper list and projects for Embodied AI
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
A curated list of visual reinforcement learning resources
An open source implementation of CLIP.
Artificial Intelligence Research for Science (AIRS)
List the AI for Science papers accepted by top conferences
SeqTrackv2: Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
The official python toolkit for running experiments and evaluate performance on VideoCube benchmark @TPAMI2023
[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gra…
[CVPRW’24 Best Paper Honorable Mention Award] DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
[ICCV'23] CiteTracker: Correlating Image and Text for Visual Tracking