Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "file" counts for a few datasets #87

Open
alix-tz opened this issue Oct 31, 2022 · 1 comment
Open

Add "file" counts for a few datasets #87

alix-tz opened this issue Oct 31, 2022 · 1 comment

Comments

@alix-tz
Copy link
Member

alix-tz commented Oct 31, 2022

@PonteIneptique do you have any objection to adding the following informations in the catalog:

Dataset name (xml) file count
Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X 201
Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) 283
The POPP datasets 235
Eutyches 129
FoNDUE-GasparoSardiToponomasia-Dataset 49
FoNDUE Spanish chapbooks 19th c. Dataset 198
Éditer la correspondance de Constance de Salm (1767-1845) 45
Jeu de données OCR - Incunables sévillans 1494-1500 62
Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923) 169

I went through each of these repositories to count the number of XML files corresponding to ground truth. Note that for "Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X", I only counted the PAGE files (all the ALTO files have a PAGE equivalent, which is not true the other way around). Same for "Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923)".

If we add these metrics, we would have the "file" metric available for every dataset currently listed in the catalog.

@PonteIneptique
Copy link
Member

PonteIneptique commented Oct 31, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants