-
Notifications
You must be signed in to change notification settings - Fork 960
Remove IDs from HTML code #4012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -134,9 +136,7 @@ def get_ontology_to_unstructured_type_mapping() -> dict[str, Element]: | |||
for tag in element_type().allowed_tags | |||
} | |||
CSS_CLASS_TO_ELEMENT_TYPE_MAP: Dict[str, Type[ontology.OntologyElement]] = { | |||
element_type().css_class_name: element_type | |||
for element_type in ALL_ONTOLOGY_ELEMENT_TYPES | |||
for tag in element_type().allowed_tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how was this line of code working before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copy-paste form the mapping above (which is more specific), here we just did more loops, but the 'tag' is not used anywhere. After the change, the mapping is the same, just unnecessary loops are removed:
CSS_CLASS_TO_ELEMENT_TYPE_MAP_BEFORE: Dict[str, Type[ontology.OntologyElement]] = {
element_type().css_class_name: element_type
for element_type in ALL_ONTOLOGY_ELEMENT_TYPES
for tag in element_type().allowed_tags
}
CSS_CLASS_TO_ELEMENT_TYPE_MAP_AFTER: Dict[str, Type[ontology.OntologyElement]] = {
element_type().css_class_name: element_type for element_type in ALL_ONTOLOGY_ELEMENT_TYPES
}
print(CSS_CLASS_TO_ELEMENT_TYPE_MAP_BEFORE == CSS_CLASS_TO_ELEMENT_TYPE_MAP_AFTER)
# Output: True
This is because some CSS classes can have multiple HTML tags assigned eg.
class Heading(OntologyElement):
description: str = Field("Section headings (levels 1-6)", frozen=True)
elementType: ElementTypeEnum = Field(ElementTypeEnum.text, frozen=True)
allowed_tags: List[str] = Field(["h1", "h2", "h3", "h4", "h5", "h6"], frozen=True)
In this case, we were creating a 6 times entry: "Heading": OntologyElement.Heading
Removed
Print For some reason, it was assigned in the unit test (even though I got None = None locally), possibly due to some version mismatch. I think it needs another debugging session and is not a blocker for this ticket (this change didn't touch detection origin) |
if parent_id not in id_to_element_mapping: | ||
raise ValueError(f"Parent element {parent_id} not found in id_to_element_mapping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a unit test for the new exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but I noticed that they make no sense. parent_id is always set before (if isn't it is generated) and ontology always returns some element so I just removed raising exception part
hold on while I bump our base image to fix the dockerfile cve |
In this pull request parent-child relationship for elements generated with v2 parser is based on actual element IDs instead of IDs baked somewhere in the HTML script. With some extra bug fixing it allowed for significantly simplifying json -> HTML script
In this pull request parent-child relationship for elements generated with v2 parser is based on actual element IDs instead of IDs baked somewhere in the HTML script.
With some extra bug fixing it allowed for significantly simplifying json -> HTML script