-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect assignment of deeply-nested tags to top-level tags #277
base: master
Are you sure you want to change the base?
Conversation
This test exercises the corner case where a deeply-nested SeriesInstanceUID attribute can be incorrectly saved back to the top-level SeriesInstanceUID attribute, overwriting its value.
In cases where tags are deeply nested within sequences, the output of deid functions were incorrectly being assigned to the top-level tags with matching names. For example, a nested SeriesInstanceUID within a ReferencedSeriesSequence was overwriting the top-level SeriesInstanceUID value. This was resolved by traversing the tree of tags and setting the nested element, rather than the top-level element.
@ReeceStevens for black we have it pinned to |
Thanks @vsoch! Updated and pushed. |
else: | ||
return int(tag_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify this a bit - if the first statement returns, the else isn't needed.
else: | |
return int(tag_string) | |
return int(tag_string) |
if tag_string.startswith("("): | ||
result = parentheses_hex_tag_format.match(tag_string) | ||
return int(f"0x{result.group(1)}{result.group(2)}", base=16) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here (per my comment below). If we return in the first statement, then the else here isn't needed, and you can unindent all of the following logic and make it easier for the developer (us!) to read.
# A string representation of the full tag path, including parent sequences | ||
full_tag_path = field | ||
if hasattr(field, "uid"): | ||
full_tag_path = field.uid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this little section doing?
if is_filemeta: | ||
self.dicom.file_meta.add(element) | ||
else: | ||
self.dicom.add(element) | ||
dataset_cursor = self.dicom | ||
*parent_tags, last_tag = full_tag_path.split("__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here about how this data is unwrapped, and why the last_tag is special/different?
@@ -0,0 +1,104 @@ | |||
import contextlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job to add a full test! 👏
Description
Related issues: I haven't opened an issue for this yet, nor did I see a related issue already open.
Hello pydicom team! Thanks for your excellent work on this library. I ran across an issue in production for which I wanted to push an upstream fix.
During header deidentification, deeply nested tag values (i.e.
ReferencedSeriesSequence > 0 > SeriesInstanceUID
) were being processed and saved back to the top-level tag (i.e, just the top-levelSeriesInstanceUID
). This was causing some very confusing and surprising behavior with UIDs not being converted to the expected values.I added a minimal reproducible test case (1f9cb7e) which fails pre-patch. Once you apply commit 8b646e5, the test passes.
Please let me know if you have any questions or concerns about the patch! Thanks!
Checklist
Open questions
None