Stars
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
A list of referring video object segmentation papers
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Deep Interactive Thin Object Selection
A Survey on Vision-Language Geo-Foundation Models (VLGFMs)
Efficient vision foundation models for high-resolution generation and perception.
A modern GUI client based on Tauri, designed to run in Windows, macOS and Linux for tailored proxy experience
List of datasets, codes, and contests related to remote sensing change detection
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
SAM (Segment Anything Model) for generating rotated bounding boxes with MMRotate, which is a comparison method of H2RBox-v2.
CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).
[CVPR 2022] Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Implementation for "Context Prior for Scene Segmentation"
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
realtime multiple people tracking (centerNet based person detector + deep sort algorithm with pytorch)
这是一个deeplabv3-plus-pytorch的源码,可以用于训练自己的模型。
Pretrained DeepLabv3 and DeepLabv3+ for Pascal VOC & Cityscapes
Training code for "SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation"
Download City Scapes Dataset using this script
Evaluation Framework for DAVIS 2017 Semi-supervised and Unsupervised used in the DAVIS Challenges
[WACV 2022] Pixel-Level Bijective Matching for Video Object Segmentation
Code for our CVPR2021 paper coordinate attention
FEELVOS implementation in PyTorch; FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond