Skip to content

The Florence Tool CLI provides a command-line interface for processing images using the Florence-2 model. This tool allows users to apply various visual and text-based tasks, such as object detection, captioning, and OCR, on individual images or entire folders of images.

License

Notifications You must be signed in to change notification settings

ScreenGlass/florence-tool-trimmed

 
 

Repository files navigation

Florence Tool CLI

Overview

The Florence Tool CLI provides a command-line interface for processing images using the Florence-2 model. This tool allows users to apply various visual and text-based tasks, such as object detection, captioning, and OCR, on individual images or entire folders of images.

Features

  • Model Loading: Load and run the Florence-2 model from a local path or Hugging Face hub.
  • Task Variety: Supports a wide range of tasks, including captioning, object detection, dense region captioning, OCR, and more.
  • Batch Processing: Efficiently process images in batches from a folder.
  • Recursive Search: Optionally process images within subdirectories.
  • Customizable Output: Save results in JSON, CSV, or plain text formats with optional suffixes and overwrite modes.
  • Flexible Image Handling: Specify the image file extensions to process, allowing for flexibility in file types.

Installation

  1. Clone the repository:

    git clone https://github.com/bigdata-pw/florence-tool.git
    cd florence-tool
  2. Install dependencies:

    pip install -r requirements.txt
  3. Install the tool:

    pip install -e .

Usage

Command-Line Interface

You can use the tool directly from the command line by running the following command:

florence-tool run [OPTIONS]

Options

  • --hf-hub-or-path (required): Path or Hugging Face hub model identifier for Florence-2.
  • --device: Device to run the model on (e.g., "cuda:0" or "cpu"). Default is "cuda:0".
  • --dtype: Torch dtype to use (e.g., "float16", "float32", "bfloat16"). Default is "float16".
  • --task (required): Task to run (e.g., "<CAPTION>", "<OD>", etc.).
  • --image: Path to an image file.
  • --folder: Path to a folder containing images.
  • --wds: WebDataset.
  • --output-dir: Directory to save the results.
  • --text-input: Optional text input for tasks that require it.
  • --max-new-tokens: Maximum number of new tokens to generate. Default is 1024.
  • --num-beams: Number of beams for beam search. Default is 3.
  • --output-format: Format to save the results (json, csv, or txt). Default is json.
  • --recursive: Process subdirectories if specified.
  • --suffix: Suffix to use for the output file.
  • --overwrite: Flag to overwrite existing files. If not specified, appends/updates the files.
  • --image-extensions: Comma-separated list of image file extensions to include (e.g., "jpg,png,jpeg"). Default is "jpg,png".
  • --batch-size: Number of images to process in a batch. Default is 1.
  • --num-workers: Number of Dataloader workers. Default is 4, overriden to 0 on Windows.
  • --prefetch-factor: Prefetch factor for Dataloader workers. Default is 4.
  • --image-key: WebDataset image key.
  • --no-check: Skips task type check for models with added task types - task will be processed as the default pure_text type. This is intended for testing purposes, new task types can be added to the check, just create an issue or PR.

Examples

Processing a Single Image

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --image /path/to/image.jpg --output-dir /path/to/output/

Processing Images in a Folder

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<OD>" --folder /path/to/folder/ --output-dir /path/to/output/

Processing a WebDataset

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "shard-{00000..00069}.tar" --output-dir /path/to/output/

Processing a WebDataset (streaming)

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "pipe:aws s3 cp s3://data/shard-{00000..00069}.tar -" --output-dir /path/to/output/
florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --wds "pipe:aws s3 cp s3://data/shard-{00000..00069}.tar --endpoint-url https://00000000000000000000000000000000.r2.cloudflarestorage.com -" --output-dir /path/to/output/

Processing Images Recursively

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<OCR>" --folder /path/to/folder/ --output-dir /path/to/output/ --recursive

Custom Image Extensions

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<DENSE_REGION_CAPTION>" --folder /path/to/folder/ --image-extensions jpg,png,jpeg --output-dir /path/to/output/

Saving Results in CSV Format

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<REGION_PROPOSAL>" --folder /path/to/folder/ --output-dir /path/to/output/ --output-format csv

Overwriting Existing Files

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --suffix captions --overwrite

Processing in Batches

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --batch-size 4

Running on CPU

florence-tool run --hf-hub-or-path microsoft/Florence-2-large --task "<CAPTION>" --folder /path/to/folder/ --output-dir /path/to/output/ --device "cpu"

Tasks

<OCR>
<OCR_WITH_REGION>
<CAPTION>
<DETAILED_CAPTION>
<MORE_DETAILED_CAPTION>
<OD>
<DENSE_REGION_CAPTION>
<CAPTION_TO_PHRASE_GROUNDING>
<REFERRING_EXPRESSION_SEGMENTATION>
<REGION_TO_SEGMENTATION>
<OPEN_VOCABULARY_DETECTION>
<REGION_TO_CATEGORY>
<REGION_TO_DESCRIPTION>
<REGION_TO_OCR>
<REGION_PROPOSAL>

Third Party Task Types

Supported by MiaoshouAI/Florence-2-base-PromptGen-v1.5

<GENERATE_TAGS>
<MIXED_CAPTION>

Supported by MiaoshouAI/Florence-2-base-PromptGen

<GENERATE_PROMPT>

Development

Code Structure

  • florence_tool.py: Main class that implements the Florence-2 model handling and processing logic.
  • cli.py: Command-line interface built with Click.
  • modeling: Directory containing model configuration and processing scripts.

Running Locally

To run the CLI locally without installing:

python -m florence_tool.cli run [OPTIONS]

Contributing

Contributions are welcome! Please submit a pull request or open an issue if you have ideas or find a bug.

License

This project is licensed under the Apache 2.0 License.

About

The Florence Tool CLI provides a command-line interface for processing images using the Florence-2 model. This tool allows users to apply various visual and text-based tasks, such as object detection, captioning, and OCR, on individual images or entire folders of images.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%