We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Note: In METS, the labels are a flat sequence of gt:state elements with @prop from the above mentioned schema file, one per page.
gt:state
@prop
<mets:dmdSec ID="DMDGT_0001"> <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="GT"> <mets:xmlData> <gt:gt> <gt:state prop="granularity/physical/document-related/word"/> <gt:state prop="granularity/physical/document-related/text-line"/> <gt:state prop="granularity/physical/document-related/region"/> <gt:state prop="data-attributes/document-related/visual/text/font/multi-font/typefaces"/> <gt:state prop="data-attributes/document-related/visual/text/font/multi-font/font-sizes"/> <gt:state prop="data-attributes/language/mixed"/> <gt:state prop="condition/production-related/document-faults/ink-from-facing"/> <gt:state prop="condition/wear/additions/informative/annotations"/> <gt:state prop="condition/production-related/document-characteristics/low-contrast"/> <gt:state prop="condition/acquisition/method-flaws/imaging/uneven-illumination"/> </gt:gt> </mets:xmlData> </mets:mdWrap> </mets:dmdSec>
These are then referenced under each physical structMap's page via @DMDID.
@DMDID
IMO in core we first need some additional API to support that. Like (in analogy to pageId):
OcrdMets.get_gt_labelling(self, for_fileIds=None) # returns dict of file ID to label list OcrdMets.get_gt_labelling_for_file(self, ocrd_file) # returns label list OcrdMets.set_gt_labelling_for_file(self, labels, ocrd_file) # takes label list # but also: OcrdMets.add_file(self, ... labels=None, ...) # add full label list OcrdMets.find_files(self, ... labels=None, ...) # filter by label list (match any)
What's your opinion, @kba?
Perhaps – instead of parsing this from the METS, we could also see to it that OCR-D mirrors them in the parsed PAGE-XML, i.e. OcrdPage.
OcrdPage
For example as:
<MetadataItem type="imageProperties" name="gt-labelling"> <Labels externalModel="https://github.com/OCR-D/gt-labelling/blob/master/xsd_schema/OCR-D_GT_schema.xsd" externalId="http://www.ocr-d.de/GT/"> <Label value="granularity/physical/document-related/word"/> <Label value="granularity/physical/document-related/text-line"/> <Label value="granularity/physical/document-related/region"/> <Label value="data-attributes/document-related/visual/text/font/multi-font/typefaces"/> <Label value="data-attributes/document-related/visual/text/font/multi-font/font-sizes"/> <Label value="data-attributes/language/mixed"/> <Label value="condition/production-related/document-faults/ink-from-facing"/> <Label value="condition/wear/additions/informative/annotations"/> <Label value="condition/production-related/document-characteristics/low-contrast"/> <Label value="condition/acquisition/method-flaws/imaging/uneven-illumination"/> </Labels> </MetadataItem>
This would make it easier to access the labels from a processor or PAGE viewer.
Originally posted by @bertsky in hnesk/browse-ocrd#36 (comment)
The text was updated successfully, but these errors were encountered:
@tboenig perhaps relevant for gt-guideline-examples etc.
Sorry, something went wrong.
kba
No branches or pull requests
Note: In METS, the labels are a flat sequence of
gt:state
elements with@prop
from the above mentioned schema file, one per page.These are then referenced under each physical structMap's page via
@DMDID
.IMO in core we first need some additional API to support that. Like (in analogy to pageId):
What's your opinion, @kba?
Perhaps – instead of parsing this from the METS, we could also see to it that OCR-D mirrors them in the parsed PAGE-XML, i.e.
OcrdPage
.For example as:
This would make it easier to access the labels from a processor or PAGE viewer.
Originally posted by @bertsky in hnesk/browse-ocrd#36 (comment)
The text was updated successfully, but these errors were encountered: