-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why winnows for DMS are different ... v2 vs. sr3 #1017
Comments
If I go into nodupe, and explicitly check for the rename field and use it if present... |
another option is to update things in flow/__init__.py updateFieldsAccepted()
is that good? It should download to the same place, but if someone is doing accepts, we could keep the rest of relPath and just replace the name... with mirror off it has the same effect as today. This would allow the same accept/reject clauses to be used by downstream consumers as originally done in v2. Doing either of those things fixes the nodupe check, but the publish is oddly unaffected. strange... thinking... |
another option is... looking at the value of the rename field in DMS messages... it's the the directories in the relpath with / replace by . could just set the nodupe_basis to "path" and it should work fine.... it does. hmm... Are all DMS products built this way? |
got confirmation that all DMS messages are built this way, so using path should be fine. |
another effect, discovered by @andreleblanc11 is that the topic from DMS is not what Sarracenia expects. example inbound message from DMS:
The topic is supposed to be a combination of "v02.post.<the directory tree>" v2 topic out-bound: v02.post.msc-dms-dev... unchanged from what is received from DMS the sr3 one makes it conform to sarracenia expects, matching the path, so it removes the msc-dms-dev.
To get the same as v2 we could add post_topicPrefix v02.post.msc-dms-dev to the winnow config. |
Just changed the summary at the top:
|
Summarizing discussion:
There are also settings like post_baseURL that need to be respected, and those involve modifying the message in flight, and would again require two different cases, changing different fields depending on self.o.download. So the real ask is to pass the message through, and re-build it just as it was... It isn't really to pass it through un-modified. We talked about using a more elaborate topicPrefix, but the objection was that if the topic isn't fixed ( v02.post.msc-dms-dev, v02.post.msc-dms-stage ) ... how to allow for that... Another option would be to validate the topic on receipt. If it matches the sarracenia norms, then do nothing, normal processing should be correct. If it does not, then build a topic header as an over-ride. ... yeah that might work... thinking... |
OK, I coded it up so:
the problem with this is that... when download=True, the topic still won't be overwritten. So I guess I need to delete the topic override when download=True to allow construction of correct topics for publish? Can still override the topic with after_work(), or post() ... hmm... does that make sense? Maybe post_topic construction is more complicated? |
So... now I'm thinking topicPreserve as an option, set to True by default for shovels and winnows, and False for things that download.... |
Now I'm having philosophical second thoughts: maybe the whole way "topic" is handled is just weird. we currently parse the inbound topic into a "subtopic" and keep the "topicPrefix" separately. if the topic provided doesn't match the topicPrefix (not sure how that would happen.) then the chop of subtopic is wrong. The subtopic is a list because the topic separate varies by protocol... e.g. if reading from amqp and publishing to mqtt the separator is . on input and / on output. so turning it into a list makes some sense... This is turning into it's own issue. |
Summary
if you run sr3 convert winnow/xx on a v2 winnow (or shovel) and then try to run it, it will publish very few, close to nil, messages. There is some additional adjustment needed to get things working properly. Messages from DMS are quite application specific, and v2 has some quirks that make it "interesting"
Symptoms and Troubleshooting
winnow message inbound from DMS looks like this:
so the relPath is on the first line and it ends with the "filename" == "data_60" ... we are using the filename as the key in the duplicate suppression, because ... I don't actually know why... I would think we could use the checksums provided by DMS, but that's a question... anyways... the winnows are currently configured to use the filename.
in v2. the filename value is derive from the value of the rename field: 'msc.observation.atmospheric.surface_weather.ca-1.1-ascii.product_generic_swob-xml-2.0.202404161800.1032731.web.orig.data_60' so THAT does provide uniqueishness. the message is passed without transformation (so that it still works when handed downstream.)
in sr3, this substitution doesn't happen because we are deriving the key from the functional relPath, rather than a filename extracted from it. The relPath can't be modified by a shovel or winnow because the downstream consumer will get the wrong value.
so... what to do...
The text was updated successfully, but these errors were encountered: