Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd resmgr download '*' weird behavior #1044

Open
kba opened this issue Apr 24, 2023 · 6 comments
Open

ocrd resmgr download '*' weird behavior #1044

kba opened this issue Apr 24, 2023 · 6 comments
Assignees
Labels

Comments

@kba
Copy link
Member

kba commented Apr 24, 2023

When running ocrd resmgr download '*' in latest ocrd_all Docker image only some models are installed:

find / |grep ocrd-resources
/usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/en-default.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz /usr/local/share/ocrd-resources/ocrd-kraken-segment /usr/local/share/ocrd-resources/ocrd-kraken-segment/blla.mlmodel /usr/local/share/ocrd-resources/ocrd-calamari-recognize /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json /usr/local/share/ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5 /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default/model_strukturerkennung.h5 /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default/model_textline_new.h5 /usr/local/share/ocrd-resources/ocrd-sbb-textline-detector/default/model_page_mixed_best.h5 /usr/local/share/ocrd-resources/ocrd-kraken-recognize /usr/local/share/ocrd-resources/ocrd-kraken-recognize/en_best.mlmodel /usr/local/share/ocrd-resources/ocrd-sbb-binarize /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin3.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin4.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin1.h5 /usr/local/share/ocrd-resources/ocrd-sbb-binarize/default/model_bin2.h5

E.g. ocrd-tesserocr-recognize models missing entirely. ocrd resmgr download ocrd-tesserocr-recognize '*' working as expected.

So, something wrong with iterating over the processors for the wildcard case.

@kba kba added the bug label Apr 24, 2023
@kba kba self-assigned this Apr 24, 2023
@bertsky
Copy link
Collaborator

bertsky commented Apr 24, 2023

What does the resmgr log say?

@kba
Copy link
Member Author

kba commented Apr 24, 2023

What does the resmgr log say?

Nothing interesting, it only logs what it is downloading, not what it's supposed to be downloading or how it decided which processors should be included. I'll add a such a log statement when debugging.

@MehmedGIT
Copy link
Contributor

Here is a snippet from my sbatch script that downloads all models:

singularity exec --bind "${SCRATCH_OCRD_MODELS_BASE}:/usr/local/share" "${SIF_PATH}" ocrd resmgr download '*'
singularity exec --bind "${SCRATCH_OCRD_MODELS_BASE}:/usr/local/share" "${SIF_PATH}" ocrd resmgr download ocrd-tesserocr-recognize '*'
In the scratch storage of the HPC environment

${SCRATCH_OCRD_MODELS_BASE} = /scratch1/users/mmustaf/ocrd_models

gwdu101:127 16:11:22 /scratch1/users/mmustaf/ocrd_models > du -ha
512	./tessdata/configs/digits
512	./tessdata/configs/box.train
512	./tessdata/configs/unlv
512	./tessdata/configs/hocr
512	./tessdata/configs/pdf
512	./tessdata/configs/ambigs.train
512	./tessdata/configs/kannada
512	./tessdata/configs/get.images
512	./tessdata/configs/makebox
512	./tessdata/configs/alto
512	./tessdata/configs/linebox
512	./tessdata/configs/api_config
512	./tessdata/configs/bigram
512	./tessdata/configs/bazaar
512	./tessdata/configs/txt
512	./tessdata/configs/lstmbox
512	./tessdata/configs/tsv
512	./tessdata/configs/logfile
512	./tessdata/configs/box.train.stderr
512	./tessdata/configs/quiet
512	./tessdata/configs/wordstrbox
512	./tessdata/configs/lstm.train
512	./tessdata/configs/rebox
512	./tessdata/configs/Makefile.am
512	./tessdata/configs/inter
512	./tessdata/configs/strokewidth
512	./tessdata/configs/lstmdebug
14K	./tessdata/configs
2,2M	./tessdata/equ.traineddata
1,1M	./tessdata/Fraktur_GT4HistOCR.traineddata
11M	./tessdata/Fraktur.traineddata
4,2M	./tessdata/ONB.traineddata
4,0M	./tessdata/eng.traineddata
11M	./tessdata/osd.traineddata
6,2M	./tessdata/frk.traineddata
1,5M	./tessdata/deu.traineddata
3,3M	./tessdata/frak2021.traineddata
86M	./tessdata/Latin.traineddata
128M	./tessdata
80M	./ocrd-resources/ocrd-cis-ocropy-recognize/en-default.pyrnn.gz
17M	./ocrd-resources/ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz
42M	./ocrd-resources/ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz
2,9M	./ocrd-resources/ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz
141M	./ocrd-resources/ocrd-cis-ocropy-recognize
18M	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5
29K	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json
18M	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5
29K	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json
29K	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json
29K	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json
18M	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5
29K	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json
18M	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5
18M	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5
89M	./ocrd-resources/ocrd-calamari-recognize/zpd-fraktur19
19M	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5
47K	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json
19M	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5
47K	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json
47K	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json
47K	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json
19M	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5
47K	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json
19M	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5
19M	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5
92M	./ocrd-resources/ocrd-calamari-recognize/zpd-latin-script-hist-3
19M	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5
24K	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json
24K	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json
24K	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json
19M	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5
19M	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5
19M	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5
24K	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json
24K	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json
19M	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5
92M	./ocrd-resources/ocrd-calamari-recognize/qurator-gt4histocr-1.0
272M	./ocrd-resources/ocrd-calamari-recognize
147M	./ocrd-resources/ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5
147M	./ocrd-resources/ocrd-sbb-binarize/default-2021-03-09
147M	./ocrd-resources/ocrd-sbb-binarize
560M	./ocrd-resources
687M	.

For comparison check the models downloaded with older version (not sure which one, the latest one in January) of ocrd/all:maximum when ocrd-tesserocr-recognize models used to be located under ocrd-resources folder:

docker run --rm -v "/home/cloud/ocrd_models/:/usr/local/share/ocrd-resources" -- ocrd/all:maximum ocrd resmgr download '*'
In the Operandi live VM:
cloud@operandi-live:~/ocrd_models$ du -ha
2,8M	./ocrd-kraken-recognize/en_best.mlmodel
2,9M	./ocrd-kraken-recognize
438M	./ocrd-sbb-textline-detector/default/model_page_mixed_best.h5
438M	./ocrd-sbb-textline-detector/default/model_textline_new.h5
438M	./ocrd-sbb-textline-detector/default/model_strukturerkennung.h5
1,3G	./ocrd-sbb-textline-detector/default
1,3G	./ocrd-sbb-textline-detector
4,0K	./ocrd-anybaseocr-dewarp/latest_net_G.pth
8,0K	./ocrd-anybaseocr-dewarp
1,5M	./ocrd-tesserocr-recognize/deu.traineddata
2,2M	./ocrd-tesserocr-recognize/equ.traineddata
11M	./ocrd-tesserocr-recognize/Fraktur.traineddata
4,0M	./ocrd-tesserocr-recognize/eng.traineddata
6,2M	./ocrd-tesserocr-recognize/frk.traineddata
11M	./ocrd-tesserocr-recognize/osd.traineddata
3,3M	./ocrd-tesserocr-recognize/frak2021.traineddata
1,1M	./ocrd-tesserocr-recognize/Fraktur_GT4HistOCR.traineddata
4,2M	./ocrd-tesserocr-recognize/ONB.traineddata
4,0K	./ocrd-tesserocr-recognize/configs/get.images
4,0K	./ocrd-tesserocr-recognize/configs/lstmdebug
4,0K	./ocrd-tesserocr-recognize/configs/box.train
4,0K	./ocrd-tesserocr-recognize/configs/Makefile.am
4,0K	./ocrd-tesserocr-recognize/configs/lstmbox
4,0K	./ocrd-tesserocr-recognize/configs/api_config
4,0K	./ocrd-tesserocr-recognize/configs/kannada
4,0K	./ocrd-tesserocr-recognize/configs/wordstrbox
4,0K	./ocrd-tesserocr-recognize/configs/bazaar
4,0K	./ocrd-tesserocr-recognize/configs/box.train.stderr
4,0K	./ocrd-tesserocr-recognize/configs/strokewidth
4,0K	./ocrd-tesserocr-recognize/configs/txt
4,0K	./ocrd-tesserocr-recognize/configs/linebox
4,0K	./ocrd-tesserocr-recognize/configs/unlv
4,0K	./ocrd-tesserocr-recognize/configs/lstm.train
4,0K	./ocrd-tesserocr-recognize/configs/hocr
4,0K	./ocrd-tesserocr-recognize/configs/digits
4,0K	./ocrd-tesserocr-recognize/configs/logfile
4,0K	./ocrd-tesserocr-recognize/configs/inter
4,0K	./ocrd-tesserocr-recognize/configs/pdf
4,0K	./ocrd-tesserocr-recognize/configs/bigram
4,0K	./ocrd-tesserocr-recognize/configs/quiet
4,0K	./ocrd-tesserocr-recognize/configs/alto
4,0K	./ocrd-tesserocr-recognize/configs/tsv
4,0K	./ocrd-tesserocr-recognize/configs/makebox
4,0K	./ocrd-tesserocr-recognize/configs/rebox
4,0K	./ocrd-tesserocr-recognize/configs/ambigs.train
112K	./ocrd-tesserocr-recognize/configs
86M	./ocrd-tesserocr-recognize/Latin.traineddata
128M	./ocrd-tesserocr-recognize
4,9M	./ocrd-kraken-segment/blla.mlmodel
4,9M	./ocrd-kraken-segment
438M	./ocrd-sbb-binarize/default/model_bin3.h5
438M	./ocrd-sbb-binarize/default/model_bin2.h5
438M	./ocrd-sbb-binarize/default/model_bin1.h5
438M	./ocrd-sbb-binarize/default/model_bin4.h5
1,8G	./ocrd-sbb-binarize/default
147M	./ocrd-sbb-binarize/default-2021-03-09/model_bin_sbb_ens.h5
147M	./ocrd-sbb-binarize/default-2021-03-09
1,9G	./ocrd-sbb-binarize
4,0K	./ocrd-anybaseocr-tiseg/seg_model/assets
4,1M	./ocrd-anybaseocr-tiseg/seg_model/saved_model.pb
63M	./ocrd-anybaseocr-tiseg/seg_model/variables/variables.data-00001-of-00002
100K	./ocrd-anybaseocr-tiseg/seg_model/variables/variables.data-00000-of-00002
20K	./ocrd-anybaseocr-tiseg/seg_model/variables/variables.index
63M	./ocrd-anybaseocr-tiseg/seg_model/variables
67M	./ocrd-anybaseocr-tiseg/seg_model
67M	./ocrd-anybaseocr-tiseg
2,9M	./ocrd-cis-ocropy-recognize/fraktur-jze.pyrnn.gz
17M	./ocrd-cis-ocropy-recognize/LatinHist.pyrnn.gz
42M	./ocrd-cis-ocropy-recognize/fraktur.pyrnn.gz
80M	./ocrd-cis-ocropy-recognize/en-default.pyrnn.gz
141M	./ocrd-cis-ocropy-recognize
18M	./ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.h5
32K	./ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.json
18M	./ocrd-calamari-recognize/zpd-fraktur19/3.ckpt.h5
18M	./ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.h5
32K	./ocrd-calamari-recognize/zpd-fraktur19/1.ckpt.json
32K	./ocrd-calamari-recognize/zpd-fraktur19/0.ckpt.json
18M	./ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.h5
32K	./ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.json
18M	./ocrd-calamari-recognize/zpd-fraktur19/2.ckpt.h5
32K	./ocrd-calamari-recognize/zpd-fraktur19/4.ckpt.json
89M	./ocrd-calamari-recognize/zpd-fraktur19
19M	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.h5
24K	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.json
19M	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/3.ckpt.h5
19M	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.h5
24K	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/1.ckpt.json
24K	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/0.ckpt.json
19M	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.h5
24K	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.json
19M	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/2.ckpt.h5
24K	./ocrd-calamari-recognize/qurator-gt4histocr-1.0/4.ckpt.json
92M	./ocrd-calamari-recognize/qurator-gt4histocr-1.0
19M	./ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.h5
48K	./ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.json
19M	./ocrd-calamari-recognize/zpd-latin-script-hist-3/3.ckpt.h5
19M	./ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.h5
48K	./ocrd-calamari-recognize/zpd-latin-script-hist-3/1.ckpt.json
48K	./ocrd-calamari-recognize/zpd-latin-script-hist-3/0.ckpt.json
19M	./ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.h5
48K	./ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.json
19M	./ocrd-calamari-recognize/zpd-latin-script-hist-3/2.ckpt.h5
48K	./ocrd-calamari-recognize/zpd-latin-script-hist-3/4.ckpt.json
92M	./ocrd-calamari-recognize/zpd-latin-script-hist-3
272M	./ocrd-calamari-recognize
4,0K	./ocrd-anybaseocr-block-segmentation/block_segmentation_weights.h5
8,0K	./ocrd-anybaseocr-block-segmentation
28M	./ocrd-typegroups-classifier/densenet121.tgc
28M	./ocrd-typegroups-classifier
147M	./ocrd-eynollah-segment/default/model_tables_ens_mixed_new_2.h5
147M	./ocrd-eynollah-segment/default/model_textline_newspapers.h5
147M	./ocrd-eynollah-segment/default/model_main_covid19_lr5-5_scale_1_1_great.h5
147M	./ocrd-eynollah-segment/default/model_page_mixed_best.h5
127M	./ocrd-eynollah-segment/default/model_enhancement.h5
147M	./ocrd-eynollah-segment/default/model_bin_sbb_ens.h5
147M	./ocrd-eynollah-segment/default/model_3up_new_good_no_augmentation.h5
99M	./ocrd-eynollah-segment/default/model_scale_classifier.h5
147M	./ocrd-eynollah-segment/default/model_no_patches_class0_30eopch.h5
147M	./ocrd-eynollah-segment/default/model_main_home_corona3_rot.h5
147M	./ocrd-eynollah-segment/default/model_ensemble_s.h5
1,6G	./ocrd-eynollah-segment/default
1,6G	./ocrd-eynollah-segment
4,0K	./ocrd-anybaseocr-layout-analysis/structure_analysis/assets
14M	./ocrd-anybaseocr-layout-analysis/structure_analysis/saved_model.pb
29M	./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.data-00001-of-00002
248K	./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.data-00000-of-00002
44K	./ocrd-anybaseocr-layout-analysis/structure_analysis/variables/variables.index
30M	./ocrd-anybaseocr-layout-analysis/structure_analysis/variables
43M	./ocrd-anybaseocr-layout-analysis/structure_analysis
4,0K	./ocrd-anybaseocr-layout-analysis/mapping_densenet.pickle
43M	./ocrd-anybaseocr-layout-analysis
5,4G	.

The models are way less than what they used to be. The total size of the downloaded models is just 687MB. It used to be around 5.4GB. Also some processor models are now completely missing or not downloaded at all.

@bertsky
Copy link
Collaborator

bertsky commented May 1, 2023

It's clear the reason for this is that ResourceManager.list_available only returns database results – it does not look up all ocrd- executables in PATH. (For comparison, ResourceManager.list_installed returns database results and all resource location paths with ocrd- prefix, which is somewhat better, but still misses out on processors' module locations, as in ocrd_tesserocr.) The database then is simply the distributed resource_list.yml plus any user resources.yml. At no time do we guarantee that the latter gets filled from PATH dynamically!

I cannot find when exactly this broke, but this change looks somewhat fishy.

Since we never know when the user installs (additional) processor modules, and the database files can be out of date (as is currently the case with the distributed resource_list.yml which still contains sbb-textline-detector), IMO the correct behaviour would be:

  • list-available *: unless short-circuited with ocrd-all-tool.json, and unless dynamic=False, look up all ocrd- executables in PATH via --dump-json, add their resouce specs to the user database, and then output all known resources
  • list-installed *: unless short-circuited with ocrd-all-tool.json, and unless dynamic=False, look up all ocrd- executables in PATH via --dump-json, add their resouce specs to the user database, and then look up all known resource locations

@bertsky
Copy link
Collaborator

bertsky commented Jun 21, 2023

Speaking of short-circuiting with ocrd-all-tool.json: we do not have a dedicated issue for that, but since it's probably tied to the solution here, anyway: The idea would be to have a lookup mechanism like for ocrd_logging.conf (i.e. system location, XDG-based user location, CWD) as an opt-in for ocrd-all-tool.json. If that file can be found, then replace all dynamic lookups with queries into the list of all tools and their resources. (Of course, relying on that file creates new problems like keeping ocrd-all-tool.json up to date if you install more tools, but let's first concentrate on the substantial performance gains that this will yield.)

@kba
Copy link
Member Author

kba commented Jun 21, 2023

I've opened a separate issue for the ocrd-all-tool.json aspect in #1059

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants