-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Deny extra tags on write APIs #25596
Conversation
This commit does three important major changes: 1. We will deny writes to the v1, v2, and v3 write apis that add new tags in subsequent writes after the first write 2. We make every table have a series key by default now 3. We enfore sorting order by the series key which is the order the keys came in With these changes we have consistentcy across the various write apis and can make optimizations and future features with the assumption we have a series key. Closes #25585
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice refactor. I have a few (minor) comments that need to be addressed re: the dbg!
macros and a code comment before I ✅, then another about the TableDefinition::series_key
method that might deserve a follow-on PR.
Otherwise (CC @pauldix), after these changes, do we need the v3
write API at this point? It provides no additional functionality over /api/v3/write_lp
, other than having a different tag syntax in the line protocol.
We might be better served by removing it and holding it out so that we don't lock ourselves into anything by including it in the initial release. Thereby freeing ourselves to implement it as we see fit later down the road.
influxdb3_write/src/persister.rs
Outdated
dbg!( | ||
self.object_store | ||
.put(snapshot_file_path.as_ref(), json.into()) | ||
.await | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this dbg!
left on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this was a mistake
// TODO: can we have the primary key stored on the table definition (we already have | ||
// the series key, so that doesn't seem like too much of a stretch). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this comment can go now 🧹
&line.series.series_key, | ||
) { | ||
(Some(s), Some(l)) => { | ||
match dbg!((table_def.series_key(), &line.series.series_key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another dbg!
here, not sure if that was left on purpose...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope that was also a mistake 😭
pub fn series_key(&self) -> Vec<String> { | ||
self.series_key | ||
.iter() | ||
.map(|k| self.column_id_to_name_unchecked(k).to_string()) | ||
.collect() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this gets called in the write path for each line, might be worth returning a Vec<&str>
or Vec<Arc<str>>
to avoid the string copies. Or even a slice, if possible.
Furthermore, having to do the lookup by ID for the name every time could also be avoided by holding the Arc<str>
names around in the TableDefinition
then just iterating over those directly.
Depending on how far you take it, this could lead to a substantial change if you had to change the TableDefinition
struct, so might be better for a follow-on if that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just from a cursory try I do think it'll be a bit more substantial, especially if we cache those keys in the TableDefinition
. I've opened up an issue for future work #25606
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, returning the series key should be as cheap as possible. Given that every table now has one, it should just be a reference that gets returned (or an Arc'd thing?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking Good!
Looks good, I agree with @hiltontj that we don't need the different v3 write API at this point. We'll definitely have a new v3 model, but we will likely start with something completely fresh so not starting down that path right now is better. |
I opened up the issue here |
This commit does three important major changes:
With these changes we have consistentcy across the various write apis and can make optimizations and future features with the assumption we have a series key.
Closes #25585