You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#399 tracked fixes to FERC XBRL archivers which have been causing automated archive runs to mark changes to XBRL resources even when nothing has changed. There were some genuine fixes made that fixed issues with partitions/taxonomies changing, but every new XBRL archive is still being marked as changed even when nothing has changed. When nothing has substantively changed, these files are the exact same size as the previous version, so it's still fairly easy to see that there are no real changes and discard new drafts.
Current state
To identify the source of the issue, I used the tools zipcmp and zipinfo, which detected no changes to the underlying files or metadata. I attempted sorting files by filename before adding them to the zipfile and this made no difference. There's likely something I'm missing that could fix this and make the zipfiles appear identical, but this issue also highlights the fragility of directly comparing zipfile hashes. This will always detect changes we don't necessarily care about like headers, compression level, file order, etc.
Next steps
A better comparison would be to actually look inside zipfiles and compare their contents. This would also give us more insight into what specifically has changed between versions. This would not be too difficult to implement, as we could use the existing file comparison tooling in the validate.py module, but the issue is we don't actually have the previous version of zips available during comparison.
There are two possible implementations that come to mind to work around this problem:
Just download the zipfiles during comparison. We would probably want to change where we do the comparisons to be right after downloading, otherwise we would have to download the old and new versions.
Attach metadata containing the hash of all files within a zipfile, so we can use the metadata for comparison.
The text was updated successfully, but these errors were encountered:
Background
#399 tracked fixes to FERC XBRL archivers which have been causing automated archive runs to mark changes to XBRL resources even when nothing has changed. There were some genuine fixes made that fixed issues with partitions/taxonomies changing, but every new XBRL archive is still being marked as changed even when nothing has changed. When nothing has substantively changed, these files are the exact same size as the previous version, so it's still fairly easy to see that there are no real changes and discard new drafts.
Current state
To identify the source of the issue, I used the tools
zipcmp
andzipinfo
, which detected no changes to the underlying files or metadata. I attempted sorting files by filename before adding them to the zipfile and this made no difference. There's likely something I'm missing that could fix this and make the zipfiles appear identical, but this issue also highlights the fragility of directly comparing zipfile hashes. This will always detect changes we don't necessarily care about like headers, compression level, file order, etc.Next steps
A better comparison would be to actually look inside zipfiles and compare their contents. This would also give us more insight into what specifically has changed between versions. This would not be too difficult to implement, as we could use the existing file comparison tooling in the
validate.py
module, but the issue is we don't actually have the previous version of zips available during comparison.There are two possible implementations that come to mind to work around this problem:
The text was updated successfully, but these errors were encountered: