Multi-Object Tracking System

A modular and extensible multi-object tracking system that supports various detection and tracking algorithms. This system provides a complete pipeline for video processing, object detection, tracking, and visualization.

Features

Modular Architecture: Easily swap between different components (preprocessors, detectors, trackers, post-processors)
Multiple Detection Models:
- YOLOv8
- RT-DETR
- Grounding DINO
Multiple Tracking Algorithms:
- ByteTrack
- DeepSORT
Advanced Video Processing:
- Adaptive frame sampling
- Scene change detection
- Temporal smoothing
- Batch processing support
Visualization Tools: Built-in tools for visualizing detections and tracks
Configuration System: Flexible configuration system with defaults and easy overrides
Performance Optimized: Support for GPU acceleration and batch processing

Installation

# Clone the repository
git clone https://github.com/DanBenAmi/tracking_system.git
cd tracking_system

# Install dependencies
pip install -r requirements.txt

Quick Start

There are two main ways to run the tracking pipeline:

1. Using main.py (Recommended)

The easiest way to run the system is through main.py, which uses predefined configurations:

python main.py

The pipeline behavior is controlled through configurations in configs/run_configs.py. Here are the key configuration sections:

# Basic video processing settings
VIDEO_CONFIG = {
    "video_path": "path/to/your/video.mp4",
    "start_time": 180,  # Start processing from 3 minutes
    "end_time": 240    # Process until 4 minutes
}

# Visualization settings
VISUALIZATION_CONFIG = {
    "show_detections": False,  # Show detection boxes
    "show_tracks": True,      # Show tracking results
    "tracks_vis_params": {
        "display": True,      # Show visualization window
        "keep_size": False,   # Maintain original video size
        "frame_delay": 1/10,  # Playback speed
        "save_video": True,   # Save output video
        "output_video_suffix": "_visualized.mp4"
    }
}

# Output settings
OUTPUT_CONFIG = {
    "save_tracks": True,
    "tracks_format": "pickle",  # Options: pickle, json, yaml
    "output_dir": "output",
    "save_original_frames": True
}

2. Using the API (For Custom Integration)

For more control, you can use the tracking system API directly:

from tracking_system import create_tracking_system
from tracking_system.configs.run_configs import CUSTOM_RUN_CONFIG

# Create tracking system with custom configuration
tracking_system = create_tracking_system(CUSTOM_RUN_CONFIG)

# Process video
video_path = "path/to/your/video.mp4"
tracks = tracking_system.process_video(
    video_path,
    start_time=0,  # Start time in seconds
    end_time=None  # Process until the end
)

# Visualize results
tracking_system.visualize(
    tracking_system._last_frames,
    tracking_system._last_tracks,
    display=True,
    keep_size=True,
    output_path="output_video.mp4"
)

Components

Preprocessors

BasicPreprocessor: Simple frame resizing and batch processing
OfflinePreprocessor: Advanced features like scene detection and adaptive sampling

Detectors

YOLOv8: Fast and accurate object detection
RT-DETR: Real-time detection transformer
Grounding DINO: Vision-language detector

Trackers

ByteTrack: High-performance multi-object tracker
DeepSORT: Classic tracking algorithm with deep association metrics

Post-processors

BasicPostProcessor: Simple filtering based on track length and confidence
AdvancedOfflinePostProcessor: Sophisticated track refinement with interpolation and smoothing

Note: For detailed explanations of each component's architecture, methodology, and parameter effects, see the Detailed Component Documentation section below.

Project Structure

tracking_system/
├── base.py              # Core classes and interfaces
├── preprocessors/       # Video preprocessing components
├── detectors/          # Object detection models
├── trackers/           # Tracking algorithms
├── postprocessors/     # Track post-processing and refinement
├── configs/            # Configuration system
└── utils/              # Utility functions

Configuration

The system uses a hierarchical configuration system:

Default Configs: Base configurations for all components
Run Configs: Specific configurations for different use cases
Custom Configs: User-defined configurations that override defaults

Example configuration:

CUSTOM_RUN_CONFIG = {
    "preprocessor": {
        "type": "offline",
        "params": {
            "batch_size": 16,
            "frame_sampling": "uniform",
            "fps": 10.0
        }
    },
    "detector": {
        "type": "yolov8",
        "params": {
            "confidence_threshold": 0.5
        }
    },
    "tracker": {
        "type": "bytetrack",
    },
    "post_processor": {
        "type": "basic",
        "params": {
            "min_track_length": 7
        }
    }
}

Key Configuration Parameters

Detector Settings

"detector": {
    "type": "yolov8",  # Options: yolov8, rtdetr, dino
    "params": {
        "confidence_threshold": 0.5,  # Detection confidence threshold
        "model_path": "path/to/weights.pt",  # Model weights path
        "device": None,  # Auto-select GPU/CPU
    }
}

Tracker Settings

"tracker": {
    "type": "bytetrack",  # Options: bytetrack, deepsort
    "params": {
        "track_thresh": 0.5,  # Tracking confidence threshold
        "track_buffer": 30,   # Frames to keep track alive
        "match_thresh": 0.8   # IOU threshold for matching
    }
}

Preprocessor Settings

"preprocessor": {
    "type": "offline",  # Options: basic, offline
    "params": {
        "batch_size": 32,
        "frame_sampling": "adaptive",  # Options: uniform, adaptive, scene_based
        "fps": 30.0,  # Target processing FPS
        "target_size": [640, 640]  # Input resolution [height, width]
    }
}

Post-processor Settings

"post_processor": {
    "type": "basic",  # Options: basic, advanced_offline
    "params": {
        "min_track_length": 5,  # Minimum track length to keep
        "min_confidence": 0.3   # Minimum average confidence
    }
}

Note: For a complete list of configurable parameters and their default values, check configs/default_configs.py. Feel free to experiment with different parameter combinations to optimize for your specific use case.

Detailed Component Documentation

Preprocessors

Basic Preprocessor

Simple frame preprocessing with minimal overhead.

Key Parameters:

target_size: Controls input resolution
- Larger sizes improve detection of small objects but increase processing time
- None keeps original video resolution
batch_size: Number of frames processed together
- Larger batches improve GPU utilization but require more memory
- Recommended: 16-32 for 4GB GPU, 32-64 for 8GB+ GPU

Offline Preprocessor

Advanced preprocessing with scene analysis and adaptive sampling.

Key Parameters:

frame_sampling: Controls frame selection strategy
- "uniform": Regular intervals, good for stable scenes
- "adaptive": More frames in high-motion scenes
- "scene_based": Focuses on scene changes
scene_threshold: Sensitivity for scene change detection (0-100)
- Higher values detect subtle changes
- Lower values only detect major scene changes
temporal_smooth: Applies temporal smoothing
- Reduces noise but may blur fast motion
min_scene_length: Minimum frames between scene changes
- Prevents over-segmentation of scenes

Detectors

YOLOv8 Detector

YOLOv8 is a single-stage object detector that processes the entire image in one forward pass, making it extremely fast. It uses a CSP-Darknet backbone with multiple detection heads at different scales. The architecture employs anchor-free detection with objectness prediction and integrates advanced training techniques like mosaic augmentation and adaptive image scaling. YOLOv8 is particularly good at real-time applications and maintains a good balance between speed and accuracy.

Pros:

Excellent speed-accuracy trade-off
Good performance on small objects
Easy to deploy with many optimized backends

Cons:

May struggle with densely packed objects
Less accurate than two-stage detectors in some scenarios

Key Parameters:

confidence_threshold: Minimum detection confidence
- Higher values (e.g., 0.7): Fewer false positives but might miss objects
- Lower values (e.g., 0.3): Better recall but more false positives
iou_threshold: NMS overlap threshold
- Higher values keep more overlapping boxes
- Lower values aggressively remove overlaps
input_size: Input resolution [height, width]
- Larger sizes: Better for small objects but slower
- Smaller sizes: Faster but might miss small objects

RT-DETR Detector

RT-DETR (Real-Time Detection Transformer) combines the efficiency of YOLO-style architectures with the power of transformers. It uses a hybrid architecture with a CNN backbone for feature extraction and a lightweight transformer decoder for object detection. The model employs deformable attention and iterative refinement to achieve high accuracy while maintaining real-time performance. RT-DETR is designed to handle complex scenes with varying object scales and occlusions.

Pros:

Better handling of occlusions and complex scenes
Strong performance on varying object scales
More accurate than traditional CNN-only detectors

Cons:

Slightly slower than pure CNN approaches
Higher memory requirements

Key Parameters:

max_det: Maximum detections per frame
- Higher values catch more objects but slower post-processing
- Lower values faster but might miss objects in crowded scenes
half: Use FP16 precision
- True: Faster and less memory on supported GPUs
- False: More accurate but slower

Grounding DINO

Grounding DINO is a vision-language object detector that can detect objects based on natural language descriptions. It uses a transformer-based architecture that jointly processes visual and textual inputs, allowing for zero-shot detection of new object categories. The model employs cross-attention mechanisms to ground language descriptions to visual features and can handle both open-vocabulary and closed-set detection scenarios.

Pros:

Flexible object category definition through text
Zero-shot detection capabilities
Strong semantic understanding

Cons:

Slower than pure object detectors
May require careful prompt engineering
Higher computational requirements

Key Parameters:

text_prompt: Text description of objects to detect
box_threshold: Minimum box confidence
text_threshold: Minimum text-grounding confidence

Trackers

ByteTrack

ByteTrack is a simple yet effective tracking-by-detection approach that utilizes all detection boxes instead of just high-confidence ones. It employs a two-stage association strategy: first matching high-confidence detections with existing tracks, then using low-confidence detections to recover occluded objects. This approach significantly improves tracking performance in crowded scenes and during occlusions. ByteTrack maintains high efficiency by using simple motion models and IoU-based matching.

Pros:

State-of-the-art tracking performance
Robust to occlusions and crowded scenes
Computationally efficient

Cons:

May need careful threshold tuning
Can be sensitive to detection quality
Limited appearance modeling

Key Parameters:

track_thresh: High-confidence threshold
- Above this: Create new tracks
- Below this: Used for track association only
match_thresh: IOU matching threshold
- Higher values: Stricter matching, fewer ID switches
- Lower values: More lenient matching, better track continuity
track_buffer: Frames to keep inactive tracks
- Larger buffer: Better recovery from occlusions
- Smaller buffer: Less memory usage

DeepSORT

DeepSORT extends the traditional SORT algorithm with deep learning-based appearance features. It combines Kalman filtering for motion prediction with a deep association metric learned from a large-scale person re-identification dataset. The algorithm maintains appearance feature galleries for each track and uses both motion and appearance information for data association. This makes it particularly effective at handling long-term occlusions and ID switches.

Pros:

Robust to long-term occlusions
Good identity preservation
Well-suited for person tracking

Cons:

Higher computational overhead
Requires feature extraction
May struggle with dense crowds

Key Parameters:

max_cosine_distance: Feature similarity threshold
- Lower values: Stricter feature matching
- Higher values: More lenient matching
nn_budget: Maximum size of appearance descriptor gallery
- Larger values: Better reidentification but more memory
max_iou_distance: Maximum IOU distance for matching
- Controls spatial association strictness

Post-processors

Basic Post-processor

Simple filtering based on track statistics.

Key Parameters:

min_track_length: Minimum frames for valid track
- Higher values: More stable tracks but might miss short interactions
- Lower values: Catches brief appearances but more false tracks
min_confidence: Minimum average confidence
- Filters out uncertain tracks

Advanced Offline Post-processor

Sophisticated track refinement with interpolation and smoothing.

Key Parameters:

max_frame_gap: Maximum frames to interpolate
- Larger gaps: Better track continuity but might create false connections
velocity_threshold: Maximum allowed object velocity
- Filters out physically impossible movements
smooth_window: Temporal smoothing window size
- Larger window: Smoother tracks but might lag behind fast motion
interpolate_gaps: Whether to fill tracking gaps
- True: More complete tracks but might create false trajectories

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this project in your research, please cite:

@misc{tracking_system,
  author = {Dan Ben Ami},
  title = {Multi-Object Tracking System},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/DanBenAmi/tracking_system}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
data_samples		data_samples
detectors		detectors
postprocessors		postprocessors
preprocessors		preprocessors
trackers		trackers
utils		utils
README.md		README.md
__init__.py		__init__.py
base.py		base.py
config.py		config.py
example_usage.py		example_usage.py
main.py		main.py
registry.py		registry.py
requirements.txt		requirements.txt

DanBenAmi/tracking_system

Folders and files

Latest commit

History

Repository files navigation

Multi-Object Tracking System

Table of Contents

Features

Installation

Quick Start

1. Using main.py (Recommended)

2. Using the API (For Custom Integration)

Components

Preprocessors

Detectors

Trackers

Post-processors

Project Structure

Configuration

Key Configuration Parameters

Detector Settings

Tracker Settings

Preprocessor Settings

Post-processor Settings

Detailed Component Documentation

Preprocessors

Basic Preprocessor

Offline Preprocessor

Detectors

YOLOv8 Detector

RT-DETR Detector

Grounding DINO

Trackers

ByteTrack

DeepSORT

Post-processors

Basic Post-processor

Advanced Offline Post-processor

Contributing

License

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages