Skip to content

CyberAgentAILab/cr-renderer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CR Renderer

This Python package renders document data in the crello dataset. This is a standalone renderer package from the OpenCOLE project.

Install

pip install git+https://github.com/CyberAgentAILab/cr-renderer

See pyproject.toml for the detail of dependency requirements.

Usage

For Crello dataset 5.0.0:

import datasets
import huggingface_hub
from cr_renderer import CrelloV5Renderer

dataset = datasets.load_dataset(
    "cyberagent/crello", revision="5.0.0", split="train")
fonts_path = huggingface_hub.hf_hub_download(
    repo_id="cyberagent/crello",
    filename="resources/fonts.pickle",
    repo_type="dataset",
    revision="5.0.0",
)
renderer = CrelloV5Renderer(dataset.features, fonts_path)
for example in dataset:
    image_bytes = renderer.render(example)

For Crello dataset 4.0.0:

import datasets
from cr_renderer import CrelloV4Renderer

dataset = datasets.load_dataset(
    "cyberagent/crello", revision="4.0.0", split="train")
renderer = CrelloV4Renderer(dataset.features)
for example in dataset:
    image_bytes = renderer.render(example)

Development

The package is managed by uv. To start development:

git clone https://github.com/CyberAgentAILab/cr-renderer.git
cd cr-renderer
uv sync