-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bare bones add #23
base: master
Are you sure you want to change the base?
Bare bones add #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am willing to help make ocrd-import more versatile, but don't fully understand your use-case (esp. the P_
change).
function debug { ocrd log -n ocrd-import debug "$1"; } | ||
function critical { echo critical "$1"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the OCR-D logging facility is too inefficient, I suggest to use a proper mechanism to override it with an additional switch. For example, by aliasing ocrd log -n ocrd-import
or echo
to a common log
backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was just a quick hack to reduce the overhead of calling ocrd log
.
For example, by aliasing
ocrd log -n ocrd-import
orecho
to a commonlog
backend.
How do you mean, aliasing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you mean, aliasing?
I meant
alias log="ocrd log -n ocrd-import"
...
((fastlog)) && alias log=echo
...
function debug { log debug "$1"; }
...
function critical { log critical "$1"; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 that is a nice solution and configurable too.
@@ -144,6 +144,7 @@ for file in $(find -L . -type f -not -name mets.xml -not -name "*.log" | sort); | |||
set -e | |||
trap rollback ERR | |||
page=p${zeros:0:$((4-${#num}))}$num | |||
echo "PAGE=$page" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use the log fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was just debugging by @stefanCCS, can go without replacement.
@@ -165,104 +166,107 @@ for file in $(find -L . -type f -not -name mets.xml -not -name "*.log" | sort); | |||
# also, avoid . in IDs, because downstream it will confuse filename suffix detection | |||
base="${base//[ :.]/_}" | |||
if ! [[ ${base:0:1} =~ [a-zA-Z] ]]; then | |||
base=f${base} | |||
#base=P_${base} | |||
a=0 # just do something to have a correct syntax for this 'if' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-XS-compliant page names are a problem regardless of numpageid
or not. I don't see why this branch should be abandoned. (But if you do, :
is the sh word for no-op.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the intention was here. @stefanCCS ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done | ||
#case "$mimetype" in | ||
# ${MIMETYPE_PAGE}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to have a version with next to no checks, the right approach would be to just add a switch to circumvent the no-clash check for resulting pages/files. Everything else is already fast (like the PAGE vs ALTO check for .xml) or can be deactivated (like the --no-convert
switch).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was very broad commenting out. The "offending" call that slows down import is
clashes=($(ocrd workspace find -i "$base" -k local_filename -k mimetype -k pageId))
These changes are not meant to be taken over as-is, they are just adaptions to make
ocrd-import
make_file_id
-comptaible IDsJust wanted to make a draft PR as a discussion basis.