-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocrd resmgr download '*'
weird behavior
#1044
Comments
What does the resmgr log say? |
Nothing interesting, it only logs what it is downloading, not what it's supposed to be downloading or how it decided which processors should be included. I'll add a such a log statement when debugging. |
Here is a snippet from my sbatch script that downloads all models: singularity exec --bind "${SCRATCH_OCRD_MODELS_BASE}:/usr/local/share" "${SIF_PATH}" ocrd resmgr download '*'
singularity exec --bind "${SCRATCH_OCRD_MODELS_BASE}:/usr/local/share" "${SIF_PATH}" ocrd resmgr download ocrd-tesserocr-recognize '*' In the scratch storage of the HPC environment
gwdu101:127 16:11:22 /scratch1/users/mmustaf/ocrd_models > du -ha
512 ./tessdata/configs/digits
512 ./tessdata/configs/box.train
512 ./tessdata/configs/unlv
512 ./tessdata/configs/hocr
512 ./tessdata/configs/pdf
512 ./tessdata/configs/ambigs.train
512 ./tessdata/configs/kannada
512 ./tessdata/configs/get.images
512 ./tessdata/configs/makebox
512 ./tessdata/configs/alto
512 ./tessdata/configs/linebox
512 ./tessdata/configs/api_config
512 ./tessdata/configs/bigram
512 ./tessdata/configs/bazaar
512 ./tessdata/configs/txt
512 ./tessdata/configs/lstmbox
512 ./tessdata/configs/tsv
512 ./tessdata/configs/logfile
512 ./tessdata/configs/box.train.stderr
512 ./tessdata/configs/quiet
512 ./tessdata/configs/wordstrbox
512 ./tessdata/configs/lstm.train
512 ./tessdata/configs/rebox
512 ./tessdata/configs/Makefile.am
512 ./tessdata/configs/inter
512 ./tessdata/configs/strokewidth
512 ./tessdata/configs/lstmdebug
14K ./tessdata/configs
2,2M ./tessdata/equ.traineddata
1,1M ./tessdata/Fraktur_GT4HistOCR.traineddata
11M ./tessdata/Fraktur.traineddata
4,2M ./tessdata/ONB.traineddata
4,0M ./tessdata/eng.traineddata
11M ./tessdata/osd.traineddata
6,2M ./tessdata/frk.traineddata
1,5M ./tessdata/deu.traineddata
3,3M ./tessdata/frak2021.traineddata
86M ./tessdata/Latin.traineddata
128M ./tessdata
80M ./ocrd-resources/ocrd-cis-ocropy-recognize/en-default.pyrnn.gz
17M ./ocrd-resources/ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz
42M ./ocrd-resources/ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz
2,9M ./ocrd-resources/ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz
141M ./ocrd-resources/ocrd-cis-ocropy-recognize
18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5
29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json
18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5
29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json
29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json
29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json
18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5
29K ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json
18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5
18M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5
89M ./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19
19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5
47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json
19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5
47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json
47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json
47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json
19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5
47K ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json
19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5
19M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5
92M ./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3
19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5
24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json
24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json
24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json
19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5
19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5
19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5
24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json
24K ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json
19M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5
92M ./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0
272M ./ocrd-resources/ocrd-calamari-recognize
147M ./ocrd-resources/ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5
147M ./ocrd-resources/ocrd-sbb-binarize/default-2021-03-09
147M ./ocrd-resources/ocrd-sbb-binarize
560M ./ocrd-resources
687M . For comparison check the models downloaded with older version (not sure which one, the latest one in January) of docker run --rm -v "/home/cloud/ocrd_models/:/usr/local/share/ocrd-resources" -- ocrd/all:maximum ocrd resmgr download '*' In the Operandi live VM:cloud@operandi-live:~/ocrd_models$ du -ha
2,8M ./ocrd-kraken-recognize/en_best.mlmodel
2,9M ./ocrd-kraken-recognize
438M ./ocrd-sbb-textline-detector/default/model_page_mixed_best.h5
438M ./ocrd-sbb-textline-detector/default/model_textline_new.h5
438M ./ocrd-sbb-textline-detector/default/model_strukturerkennung.h5
1,3G ./ocrd-sbb-textline-detector/default
1,3G ./ocrd-sbb-textline-detector
4,0K ./ocrd-anybaseocr-dewarp/latest_net_G.pth
8,0K ./ocrd-anybaseocr-dewarp
1,5M ./ocrd-tesserocr-recognize/deu.traineddata
2,2M ./ocrd-tesserocr-recognize/equ.traineddata
11M ./ocrd-tesserocr-recognize/Fraktur.traineddata
4,0M ./ocrd-tesserocr-recognize/eng.traineddata
6,2M ./ocrd-tesserocr-recognize/frk.traineddata
11M ./ocrd-tesserocr-recognize/osd.traineddata
3,3M ./ocrd-tesserocr-recognize/frak2021.traineddata
1,1M ./ocrd-tesserocr-recognize/Fraktur_GT4HistOCR.traineddata
4,2M ./ocrd-tesserocr-recognize/ONB.traineddata
4,0K ./ocrd-tesserocr-recognize/configs/get.images
4,0K ./ocrd-tesserocr-recognize/configs/lstmdebug
4,0K ./ocrd-tesserocr-recognize/configs/box.train
4,0K ./ocrd-tesserocr-recognize/configs/Makefile.am
4,0K ./ocrd-tesserocr-recognize/configs/lstmbox
4,0K ./ocrd-tesserocr-recognize/configs/api_config
4,0K ./ocrd-tesserocr-recognize/configs/kannada
4,0K ./ocrd-tesserocr-recognize/configs/wordstrbox
4,0K ./ocrd-tesserocr-recognize/configs/bazaar
4,0K ./ocrd-tesserocr-recognize/configs/box.train.stderr
4,0K ./ocrd-tesserocr-recognize/configs/strokewidth
4,0K ./ocrd-tesserocr-recognize/configs/txt
4,0K ./ocrd-tesserocr-recognize/configs/linebox
4,0K ./ocrd-tesserocr-recognize/configs/unlv
4,0K ./ocrd-tesserocr-recognize/configs/lstm.train
4,0K ./ocrd-tesserocr-recognize/configs/hocr
4,0K ./ocrd-tesserocr-recognize/configs/digits
4,0K ./ocrd-tesserocr-recognize/configs/logfile
4,0K ./ocrd-tesserocr-recognize/configs/inter
4,0K ./ocrd-tesserocr-recognize/configs/pdf
4,0K ./ocrd-tesserocr-recognize/configs/bigram
4,0K ./ocrd-tesserocr-recognize/configs/quiet
4,0K ./ocrd-tesserocr-recognize/configs/alto
4,0K ./ocrd-tesserocr-recognize/configs/tsv
4,0K ./ocrd-tesserocr-recognize/configs/makebox
4,0K ./ocrd-tesserocr-recognize/configs/rebox
4,0K ./ocrd-tesserocr-recognize/configs/ambigs.train
112K ./ocrd-tesserocr-recognize/configs
86M ./ocrd-tesserocr-recognize/Latin.traineddata
128M ./ocrd-tesserocr-recognize
4,9M ./ocrd-kraken-segment/blla.mlmodel
4,9M ./ocrd-kraken-segment
438M ./ocrd-sbb-binarize/default/model_bin3.h5
438M ./ocrd-sbb-binarize/default/model_bin2.h5
438M ./ocrd-sbb-binarize/default/model_bin1.h5
438M ./ocrd-sbb-binarize/default/model_bin4.h5
1,8G ./ocrd-sbb-binarize/default
147M ./ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5
147M ./ocrd-sbb-binarize/default-2021-03-09
1,9G ./ocrd-sbb-binarize
4,0K ./ocrd-anybaseocr-tiseg/seg_model/assets
4,1M ./ocrd-anybaseocr-tiseg/seg_model/saved_model.pb
63M ./ocrd-anybaseocr-tiseg/seg_model/variables/variables.data-00001-of-00002
100K ./ocrd-anybaseocr-tiseg/seg_model/variables/variables.data-00000-of-00002
20K ./ocrd-anybaseocr-tiseg/seg_model/variables/variables.index
63M ./ocrd-anybaseocr-tiseg/seg_model/variables
67M ./ocrd-anybaseocr-tiseg/seg_model
67M ./ocrd-anybaseocr-tiseg
2,9M ./ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz
17M ./ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz
42M ./ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz
80M ./ocrd-cis-ocropy-recognize/en-default.pyrnn.gz
141M ./ocrd-cis-ocropy-recognize
18M ./ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5
32K ./ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json
18M ./ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5
18M ./ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5
32K ./ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json
32K ./ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json
18M ./ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5
32K ./ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json
18M ./ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5
32K ./ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json
89M ./ocrd-calamari-recognize/zpd-fraktur19
19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5
24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json
19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5
19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5
24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json
24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json
19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5
24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json
19M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5
24K ./ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json
92M ./ocrd-calamari-recognize/qurator-gt4histocr-1.0
19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5
48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json
19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5
19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5
48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json
48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json
19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5
48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json
19M ./ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5
48K ./ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json
92M ./ocrd-calamari-recognize/zpd-latin-script-hist-3
272M ./ocrd-calamari-recognize
4,0K ./ocrd-anybaseocr-block-segmentation/block_segmentation_weights.h5
8,0K ./ocrd-anybaseocr-block-segmentation
28M ./ocrd-typegroups-classifier/densenet121.tgc
28M ./ocrd-typegroups-classifier
147M ./ocrd-eynollah-segment/default/model_tables_ens_mixed_new_2.h5
147M ./ocrd-eynollah-segment/default/model_textline_newspapers.h5
147M ./ocrd-eynollah-segment/default/model_main_covid19_lr5-5_scale_1_1_great.h5
147M ./ocrd-eynollah-segment/default/model_page_mixed_best.h5
127M ./ocrd-eynollah-segment/default/model_enhancement.h5
147M ./ocrd-eynollah-segment/default/model_bin_sbb_ens.h5
147M ./ocrd-eynollah-segment/default/model_3up_new_good_no_augmentation.h5
99M ./ocrd-eynollah-segment/default/model_scale_classifier.h5
147M ./ocrd-eynollah-segment/default/model_no_patches_class0_30eopch.h5
147M ./ocrd-eynollah-segment/default/model_main_home_corona3_rot.h5
147M ./ocrd-eynollah-segment/default/model_ensemble_s.h5
1,6G ./ocrd-eynollah-segment/default
1,6G ./ocrd-eynollah-segment
4,0K ./ocrd-anybaseocr-layout-analysis/structure_analysis/assets
14M ./ocrd-anybaseocr-layout-analysis/structure_analysis/saved_model.pb
29M ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.data-00001-of-00002
248K ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.data-00000-of-00002
44K ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.index
30M ./ocrd-anybaseocr-layout-analysis/structure_analysis/variables
43M ./ocrd-anybaseocr-layout-analysis/structure_analysis
4,0K ./ocrd-anybaseocr-layout-analysis/mapping_densenet.pickle
43M ./ocrd-anybaseocr-layout-analysis
5,4G . The models are way less than what they used to be. The total size of the downloaded models is just 687MB. It used to be around 5.4GB. Also some processor models are now completely missing or not downloaded at all. |
It's clear the reason for this is that ResourceManager.list_available only returns database results – it does not look up all I cannot find when exactly this broke, but this change looks somewhat fishy. Since we never know when the user installs (additional) processor modules, and the database files can be out of date (as is currently the case with the distributed resource_list.yml which still contains
|
Speaking of short-circuiting with |
I've opened a separate issue for the |
When running
ocrd resmgr download '*'
in latest ocrd_all Docker image only some models are installed:E.g.
ocrd-tesserocr-recognize
models missing entirely.ocrd resmgr download ocrd-tesserocr-recognize '*'
working as expected.So, something wrong with iterating over the processors for the wildcard case.
The text was updated successfully, but these errors were encountered: