New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Include Object Paths in S3 sync summary #17117
base: main
Are you sure you want to change the base?
Conversation
@@ -181,10 +181,21 @@ func (s *Spec) Validate() error { | |||
|
|||
func (s *Spec) ReplacePathVariables(table string, fileIdentifier string, t time.Time) string { | |||
name := strings.ReplaceAll(s.Path, varTable, table) | |||
|
|||
if table == "cloudquery_sync_summary" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be under the option to enable the summary?
objKey := c.spec.ReplacePathVariables(table.Name, uuid.NewString(), time.Now().UTC()) | ||
c.objectKeys[table.Name] = append(c.objectKeys[table.Name], objKey) | ||
|
||
if table.Name == "cloudquery_sync_summary" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems too tightly coupled with the CLI that uses the same name for the table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is designed to be an extension of the CLI summary table, all this functionality is doing is adding a column to that table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions:
- do we already have sync summary message? is it new
- will objectkeys will run out of memory and it is not bound by anything?
depends on 1 and 2 we might want to implement sync summary only on the CLI in that case and then only add file that sais this sync finished and have a path of {{sync_id}} in the config instead of sending summary to every destination (unless we already do that?)
It will be added in #17112
I have added bounding that will limit it to only log the first 10,000 keys
Adding {{sync_id}} only partially fixes the problem... S3 paths are used for at least 2 distinct things:
With the above solution assuming less than 10,000 objects in a single sync we get best of both worlds |
This PR adds in a column that will include the full keys of all objects synced in the sync
objects_synced
contains amap[string][]string
where the map key is the table name and the array of strings are the object keysThis is an extension of #17112