Skip to content

DanBenAmi/tracking_system

Repository files navigation

Multi-Object Tracking System

A modular and extensible multi-object tracking system that supports various detection and tracking algorithms. This system provides a complete pipeline for video processing, object detection, tracking, and visualization.

Table of Contents

Features

  • Modular Architecture: Easily swap between different components (preprocessors, detectors, trackers, post-processors)
  • Multiple Detection Models:
    • YOLOv8
    • RT-DETR
    • Grounding DINO
  • Multiple Tracking Algorithms:
    • ByteTrack
    • DeepSORT
  • Advanced Video Processing:
    • Adaptive frame sampling
    • Scene change detection
    • Temporal smoothing
    • Batch processing support
  • Visualization Tools: Built-in tools for visualizing detections and tracks
  • Configuration System: Flexible configuration system with defaults and easy overrides
  • Performance Optimized: Support for GPU acceleration and batch processing

Installation

# Clone the repository
git clone https://github.com/DanBenAmi/tracking_system.git
cd tracking_system

# Install dependencies
pip install -r requirements.txt

Quick Start

There are two main ways to run the tracking pipeline:

1. Using main.py (Recommended)

The easiest way to run the system is through main.py, which uses predefined configurations:

python main.py

The pipeline behavior is controlled through configurations in configs/run_configs.py. Here are the key configuration sections:

# Basic video processing settings
VIDEO_CONFIG = {
    "video_path": "path/to/your/video.mp4",
    "start_time": 180,  # Start processing from 3 minutes
    "end_time": 240    # Process until 4 minutes
}

# Visualization settings
VISUALIZATION_CONFIG = {
    "show_detections": False,  # Show detection boxes
    "show_tracks": True,      # Show tracking results
    "tracks_vis_params": {
        "display": True,      # Show visualization window
        "keep_size": False,   # Maintain original video size
        "frame_delay": 1/10,  # Playback speed
        "save_video": True,   # Save output video
        "output_video_suffix": "_visualized.mp4"
    }
}

# Output settings
OUTPUT_CONFIG = {
    "save_tracks": True,
    "tracks_format": "pickle",  # Options: pickle, json, yaml
    "output_dir": "output",
    "save_original_frames": True
}

2. Using the API (For Custom Integration)

For more control, you can use the tracking system API directly:

from tracking_system import create_tracking_system
from tracking_system.configs.run_configs import CUSTOM_RUN_CONFIG

# Create tracking system with custom configuration
tracking_system = create_tracking_system(CUSTOM_RUN_CONFIG)

# Process video
video_path = "path/to/your/video.mp4"
tracks = tracking_system.process_video(
    video_path,
    start_time=0,  # Start time in seconds
    end_time=None  # Process until the end
)

# Visualize results
tracking_system.visualize(
    tracking_system._last_frames,
    tracking_system._last_tracks,
    display=True,
    keep_size=True,
    output_path="output_video.mp4"
)

Components

Preprocessors

  • BasicPreprocessor: Simple frame resizing and batch processing
  • OfflinePreprocessor: Advanced features like scene detection and adaptive sampling

Detectors

  • YOLOv8: Fast and accurate object detection
  • RT-DETR: Real-time detection transformer
  • Grounding DINO: Vision-language detector

Trackers

  • ByteTrack: High-performance multi-object tracker
  • DeepSORT: Classic tracking algorithm with deep association metrics

Post-processors

  • BasicPostProcessor: Simple filtering based on track length and confidence
  • AdvancedOfflinePostProcessor: Sophisticated track refinement with interpolation and smoothing

Note: For detailed explanations of each component's architecture, methodology, and parameter effects, see the Detailed Component Documentation section below.

Project Structure

tracking_system/
├── base.py              # Core classes and interfaces
├── preprocessors/       # Video preprocessing components
├── detectors/          # Object detection models
├── trackers/           # Tracking algorithms
├── postprocessors/     # Track post-processing and refinement
├── configs/            # Configuration system
└── utils/              # Utility functions

Configuration

The system uses a hierarchical configuration system:

  1. Default Configs: Base configurations for all components
  2. Run Configs: Specific configurations for different use cases
  3. Custom Configs: User-defined configurations that override defaults

Example configuration:

CUSTOM_RUN_CONFIG = {
    "preprocessor": {
        "type": "offline",
        "params": {
            "batch_size": 16,
            "frame_sampling": "uniform",
            "fps": 10.0
        }
    },
    "detector": {
        "type": "yolov8",
        "params": {
            "confidence_threshold": 0.5
        }
    },
    "tracker": {
        "type": "bytetrack",
    },
    "post_processor": {
        "type": "basic",
        "params": {
            "min_track_length": 7
        }
    }
}

Key Configuration Parameters

Detector Settings

"detector": {
    "type": "yolov8",  # Options: yolov8, rtdetr, dino
    "params": {
        "confidence_threshold": 0.5,  # Detection confidence threshold
        "model_path": "path/to/weights.pt",  # Model weights path
        "device": None,  # Auto-select GPU/CPU
    }
}

Tracker Settings

"tracker": {
    "type": "bytetrack",  # Options: bytetrack, deepsort
    "params": {
        "track_thresh": 0.5,  # Tracking confidence threshold
        "track_buffer": 30,   # Frames to keep track alive
        "match_thresh": 0.8   # IOU threshold for matching
    }
}

Preprocessor Settings

"preprocessor": {
    "type": "offline",  # Options: basic, offline
    "params": {
        "batch_size": 32,
        "frame_sampling": "adaptive",  # Options: uniform, adaptive, scene_based
        "fps": 30.0,  # Target processing FPS
        "target_size": [640, 640]  # Input resolution [height, width]
    }
}

Post-processor Settings

"post_processor": {
    "type": "basic",  # Options: basic, advanced_offline
    "params": {
        "min_track_length": 5,  # Minimum track length to keep
        "min_confidence": 0.3   # Minimum average confidence
    }
}

Note: For a complete list of configurable parameters and their default values, check configs/default_configs.py. Feel free to experiment with different parameter combinations to optimize for your specific use case.

Detailed Component Documentation

Preprocessors

Basic Preprocessor

Simple frame preprocessing with minimal overhead.

Key Parameters:

  • target_size: Controls input resolution
    • Larger sizes improve detection of small objects but increase processing time
    • None keeps original video resolution
  • batch_size: Number of frames processed together
    • Larger batches improve GPU utilization but require more memory
    • Recommended: 16-32 for 4GB GPU, 32-64 for 8GB+ GPU

Offline Preprocessor

Advanced preprocessing with scene analysis and adaptive sampling.

Key Parameters:

  • frame_sampling: Controls frame selection strategy
    • "uniform": Regular intervals, good for stable scenes
    • "adaptive": More frames in high-motion scenes
    • "scene_based": Focuses on scene changes
  • scene_threshold: Sensitivity for scene change detection (0-100)
    • Higher values detect subtle changes
    • Lower values only detect major scene changes
  • temporal_smooth: Applies temporal smoothing
    • Reduces noise but may blur fast motion
  • min_scene_length: Minimum frames between scene changes
    • Prevents over-segmentation of scenes

Detectors

YOLOv8 Detector

YOLOv8 is a single-stage object detector that processes the entire image in one forward pass, making it extremely fast. It uses a CSP-Darknet backbone with multiple detection heads at different scales. The architecture employs anchor-free detection with objectness prediction and integrates advanced training techniques like mosaic augmentation and adaptive image scaling. YOLOv8 is particularly good at real-time applications and maintains a good balance between speed and accuracy.

Pros:

  • Excellent speed-accuracy trade-off
  • Good performance on small objects
  • Easy to deploy with many optimized backends

Cons:

  • May struggle with densely packed objects
  • Less accurate than two-stage detectors in some scenarios

Key Parameters:

  • confidence_threshold: Minimum detection confidence
    • Higher values (e.g., 0.7): Fewer false positives but might miss objects
    • Lower values (e.g., 0.3): Better recall but more false positives
  • iou_threshold: NMS overlap threshold
    • Higher values keep more overlapping boxes
    • Lower values aggressively remove overlaps
  • input_size: Input resolution [height, width]
    • Larger sizes: Better for small objects but slower
    • Smaller sizes: Faster but might miss small objects

RT-DETR Detector

RT-DETR (Real-Time Detection Transformer) combines the efficiency of YOLO-style architectures with the power of transformers. It uses a hybrid architecture with a CNN backbone for feature extraction and a lightweight transformer decoder for object detection. The model employs deformable attention and iterative refinement to achieve high accuracy while maintaining real-time performance. RT-DETR is designed to handle complex scenes with varying object scales and occlusions.

Pros:

  • Better handling of occlusions and complex scenes
  • Strong performance on varying object scales
  • More accurate than traditional CNN-only detectors

Cons:

  • Slightly slower than pure CNN approaches
  • Higher memory requirements

Key Parameters:

  • max_det: Maximum detections per frame
    • Higher values catch more objects but slower post-processing
    • Lower values faster but might miss objects in crowded scenes
  • half: Use FP16 precision
    • True: Faster and less memory on supported GPUs
    • False: More accurate but slower

Grounding DINO

Grounding DINO is a vision-language object detector that can detect objects based on natural language descriptions. It uses a transformer-based architecture that jointly processes visual and textual inputs, allowing for zero-shot detection of new object categories. The model employs cross-attention mechanisms to ground language descriptions to visual features and can handle both open-vocabulary and closed-set detection scenarios.

Pros:

  • Flexible object category definition through text
  • Zero-shot detection capabilities
  • Strong semantic understanding

Cons:

  • Slower than pure object detectors
  • May require careful prompt engineering
  • Higher computational requirements

Key Parameters:

  • text_prompt: Text description of objects to detect
  • box_threshold: Minimum box confidence
  • text_threshold: Minimum text-grounding confidence

Trackers

ByteTrack

ByteTrack is a simple yet effective tracking-by-detection approach that utilizes all detection boxes instead of just high-confidence ones. It employs a two-stage association strategy: first matching high-confidence detections with existing tracks, then using low-confidence detections to recover occluded objects. This approach significantly improves tracking performance in crowded scenes and during occlusions. ByteTrack maintains high efficiency by using simple motion models and IoU-based matching.

Pros:

  • State-of-the-art tracking performance
  • Robust to occlusions and crowded scenes
  • Computationally efficient

Cons:

  • May need careful threshold tuning
  • Can be sensitive to detection quality
  • Limited appearance modeling

Key Parameters:

  • track_thresh: High-confidence threshold
    • Above this: Create new tracks
    • Below this: Used for track association only
  • match_thresh: IOU matching threshold
    • Higher values: Stricter matching, fewer ID switches
    • Lower values: More lenient matching, better track continuity
  • track_buffer: Frames to keep inactive tracks
    • Larger buffer: Better recovery from occlusions
    • Smaller buffer: Less memory usage

DeepSORT

DeepSORT extends the traditional SORT algorithm with deep learning-based appearance features. It combines Kalman filtering for motion prediction with a deep association metric learned from a large-scale person re-identification dataset. The algorithm maintains appearance feature galleries for each track and uses both motion and appearance information for data association. This makes it particularly effective at handling long-term occlusions and ID switches.

Pros:

  • Robust to long-term occlusions
  • Good identity preservation
  • Well-suited for person tracking

Cons:

  • Higher computational overhead
  • Requires feature extraction
  • May struggle with dense crowds

Key Parameters:

  • max_cosine_distance: Feature similarity threshold
    • Lower values: Stricter feature matching
    • Higher values: More lenient matching
  • nn_budget: Maximum size of appearance descriptor gallery
    • Larger values: Better reidentification but more memory
  • max_iou_distance: Maximum IOU distance for matching
    • Controls spatial association strictness

Post-processors

Basic Post-processor

Simple filtering based on track statistics.

Key Parameters:

  • min_track_length: Minimum frames for valid track
    • Higher values: More stable tracks but might miss short interactions
    • Lower values: Catches brief appearances but more false tracks
  • min_confidence: Minimum average confidence
    • Filters out uncertain tracks

Advanced Offline Post-processor

Sophisticated track refinement with interpolation and smoothing.

Key Parameters:

  • max_frame_gap: Maximum frames to interpolate
    • Larger gaps: Better track continuity but might create false connections
  • velocity_threshold: Maximum allowed object velocity
    • Filters out physically impossible movements
  • smooth_window: Temporal smoothing window size
    • Larger window: Smoother tracks but might lag behind fast motion
  • interpolate_gaps: Whether to fill tracking gaps
    • True: More complete tracks but might create false trajectories

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this project in your research, please cite:

@misc{tracking_system,
  author = {Dan Ben Ami},
  title = {Multi-Object Tracking System},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/DanBenAmi/tracking_system}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages