-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "file" counts for a few datasets #87
Comments
No issue.
You can also, if you want, run humg. I usually do it for external datasets
when I have time....
Le lun. 31 oct. 2022 à 11:58 PM, Alix Chagué ***@***.***> a
écrit :
… @PonteIneptique <https://github.com/PonteIneptique> do you have any
objection to adding the following informations in the catalog:
Dataset name (xml) file count
Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10,
Urfehdenbuch X 201
Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) 283
The POPP datasets 235
Eutyches 129
FoNDUE-GasparoSardiToponomasia-Dataset 49
FoNDUE Spanish chapbooks 19th c. Dataset 198
Éditer la correspondance de Constance de Salm (1767-1845) 45
Jeu de données OCR - Incunables sévillans 1494-1500 62
Données vérité de terrain HTR+ Annuaire des propriétaires et des
propriétés de Paris et du département de la Seine (1898-1923) 169
I went through each of these repositories to count the number of XML files
corresponding to ground truth. Note that for "Handwritten Text Recognition
Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X", I only counted the
PAGE files (all the ALTO files have a PAGE equivalent, which is not true
the other way around). Same for "Données vérité de terrain HTR+ Annuaire
des propriétaires et des propriétés de Paris et du département de la Seine
(1898-1923)".
If we add these metrics, we would have the "file" metric available for
every dataset currently listed in the catalog.
—
Reply to this email directly, view it on GitHub
<#87>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOXEZXRF5B27W66ERBFCJDWGBFHPANCNFSM6AAAAAARTRJL4U>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@PonteIneptique do you have any objection to adding the following informations in the catalog:
I went through each of these repositories to count the number of XML files corresponding to ground truth. Note that for "Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X", I only counted the PAGE files (all the ALTO files have a PAGE equivalent, which is not true the other way around). Same for "Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923)".
If we add these metrics, we would have the "file" metric available for every dataset currently listed in the catalog.
The text was updated successfully, but these errors were encountered: