Decoupling Metric Store from Worker Coordinator#988
Conversation
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
✨ Finishing touches🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Failure to add the new IP will result in interrupted reviews. Comment |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 401104e.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Reviewer Guide 🔍(Review updated until commit e923af6)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to e923af6 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 3bc634a
Suggestions up to commit 401104e
Suggestions up to commit e65fa2c
Suggestions up to commit 61ef6dc
Suggestions up to commit 2367daa
|
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 92c7eaf.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 92c7eaf |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit b6de30c.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit b6de30c |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 88207f8 |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 46613ae.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 46613ae |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit e255d1a |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 1da9aaf.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
Persistent review updated to latest commit 1da9aaf |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 440f088 |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit b74b39f |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 2bdab0e |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 2199e7a |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 2367daa |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 61ef6dc |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit e65fa2c |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 401104e |
Signed-off-by: Anthony Leong <aj.leong623@gmail.com>
|
Persistent review updated to latest commit 3bc634a |
|
I have finished adding some unit tests. Please let me know if anyone has any questions, comments, or suggestions. |
|
Persistent review updated to latest commit e923af6 |
| self.send(self.master, UpdateSamples(self.worker_id, samples, self.profile_sampler.samples)) | ||
| self.send(self.sample_post_processor_actor, ProcessSamples(samples, self.profile_sampler.samples)) |
There was a problem hiding this comment.
how come we send samples to master and the new actor?
There was a problem hiding this comment.
Although the metrics collector is now in a separate actor, the progress message still needs the samples.
. Should the progress bar also be handled in the new actor?| self.update_progress_message() | ||
|
|
||
| def index_name(self, test_run_timestamp): | ||
| ts = time.from_is8601(test_run_timestamp) |
There was a problem hiding this comment.
LLM caught a typo here :) should be from_iso8601() no?
There was a problem hiding this comment.
It is also like that in the metrics file
opensearch-benchmark/osbenchmark/time.py
Line 47 in 9004daa
| self.profile_metrics_post_processor = ProfileMetricsSamplePostprocessor(self.metrics_store, | ||
| msg.workload.meta_data, | ||
| msg.test_procedure.meta_data) | ||
| os_clients = self.create_os_clients() |
There was a problem hiding this comment.
why do we create more OS clients here?
There was a problem hiding this comment.
For the new actor, telemetry has been moved over due to its dependence on the metric store. OS clients were a requirement for telemetry.
| self.workload.meta_data, | ||
| self.test_procedure.meta_data) | ||
| self.profile_metrics_post_processor(profile_samples) | ||
| class SamplePostProcessorActor(actor.BenchmarkActor): |
There was a problem hiding this comment.
let's add an ActorExitRequest method to this actor for a clean shutdown on benchmark completion
There was a problem hiding this comment.
Yes. Thank you for catching that. An edge case that has been bugging me is what happens when the coordinator shuts down before the SamplePostProcessorActor closes the metric store. I think that the clean up steps should be managed through looking out for that request. Additionally, maybe I should have named the new actor MetricStoreActor as well for readability and to tie it to the scope of its responsibilities.
|
left a few comments/questions but this is shaping up pretty well! nice work |
Description
There is a new actor called
SamplePostProcessorActor. However theSamplePostProcessorActoracts as an actor for processing samples directly from theWorkeractors as well as any task involving the metrics store. 'SamplePostProcessorActorinitialization:The new actor is initialized in the coordinator. This happens in the
prepare_benchmarkmethod. TheSamplePostProcessorActoris initialized after receiving theStartSamplePostProcessorActormessage which has the configurations for creating the metrics store, the metrics sample and profile sample post processor objects, and the telemetry collectors.StartTelemetryandStopTelemetrymessages are now used to start and stop the telemetry collection threads through the coordinatorWhich methods were changed:
send_samples: In the worker class, instead of just sending samples to the coordinator, aProcessSamplesmessage is sent to the newSamplePostProcessorActorwhich is then sent to theSamplePostprocessorinstance inside of theSamplePostProcessorActor.to_externalizable: In the metric store, theto_externalizablemethod is used to send the results of running the workload to the coordinator. We will now need to call the method through theSamplePostProcessorActorand theGetExternalizableMetricsStoremessage.close: When the coordinator is closed, the messageCloseMetricsStorewill now be used to signal to the new actor to close the metrics store.reset_relative_time: TheResetRelativeTimeRequestmessage will now be used instead to reset the relative time of the metric store in the new actor.One of the concerns if with synchronization. A lot of methods involving the metrics store that were synchronous are now handled through an asynchronous message to the
SamplePostProcessorActorwhich holds the metric store.Issues Resolved
[List any issues this PR will resolve]
Testing
[Describe how this change was tested]
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.