Skip to content

OCR D workspace com o GBN dataset

Lucas Sulzbach edited this page Feb 5, 2021 · 1 revision
  • Baixe o GBN dataset
  • Copie-o para a constantine por rsync (vide primeiro passo)
  • Modifique o script abaixo para copiar os arquivos do GBN e inicializar um workspace na sua área (lembre de executar sempre com o ambiente do conda ativado)
#! /usr/bin/env bash

workspace="/home/ls17/ocrd/sbb"
img_dir="OCR-D-IMG"

imgs=$(find /home/ls17/ocrd/workspace2/OCR-D-IMG-RESIZE-PAGE/ -name "*.png" | sort)

mkdir -p $workspace
cd $workspace

if ! [[ -f $workspace/mets.xml ]]
then
	ocrd workspace init $workspace
	ocrd workspace set-id "gbn"
fi

mkdir -p $workspace/$img_dir

for img in $imgs
do
	new_img=$workspace/$img_dir/$(sed "s/OCR-D-IMG-RESIZE-PAGE/OCR-D-IMG/g" <<< $(basename $img))

	cp $img $new_img

	fname=$(basename $new_img)
	name=$(cut -d "." -f 1 <<< $fname)

	ocrd workspace add -g $name -G $img_dir -i $name -m image/png $img_dir/$fname
done
Clone this wiki locally