-
Notifications
You must be signed in to change notification settings - Fork 17
Benchmarking and Profiling
This wikipage describes benchmarking and profiling expoeriments that we plan do perform to show the tight coupling between WP4 (benchmarking and profiling) and WP7 (pilot).
Main heads involved:
- Manuel
- Stefan
- Tasos
- Eleni
Goal to publish to this call.
- MQTT Load Testing with mqtt-malaria
Idea is to focus on the CNFs that are relevant for the performance and skip CNFs that are not interesting for benchmarking, e.g., the EAE is more or less a user interface and not relevant for benchmarking experiments. Also the DT is considered to live outside the service in a real setup. Still the DT might server as stimuli probe.
Name | Involved CDUs | Effort | Stimuli Probe | Measurem. Probe |
---|---|---|---|---|
cnf_ids_01 | suricata_ids | low | Traffic Traces | - |
cnf_cc_01 | broker | low | MQTT | MQTT |
cnf_cc_02 | broker, processor | medium | MQTT | MQTT |
cnf_cc_03 | broker, prom. exporter, prometheus | medium | MQTT | (?) |
cnf_cc_04 | broker, processor, prom. exporter, prometheus | medium | MQTT | MQTT / (?) |
cnf_mdc_01 | mdc | high | SMB | MQTT |
e2e_pilot_01 | modc, rtr, broker, processor, prom. exporter, prometheus | high | SMB | MQTT / (?) |
Description: Test the MQTT broker as central component of the pilot in isolation. Measure how much messages we can pump through it (per second).
Description: Test the CC processor (together with the broker from which the processor gets its data). Measure how much messages it can translate and forward (per second).
Description: Test the CC's local storage backend (implemented through Prometheus) that uses the broker as data source. Not fully clear what to measure here, since Prometheus fetches the data in fixed intervals. Maybe we can play with those intervals as one of the config parameters?
Description: Test the full CC. Combines cnf_cc_02 and cnf_cc_03. Multiple measurements needed.
Description: Test how fast the MDC can collect Euromap63 data. Building the right traffic generator for this is a bit challenging (the DT might be reused).
End-to-end pilot with all perf. relevant VNFs
- Identify VNF Resources Consumption Trends based on Workload Characteristics
- Identify VNF horizontally and vertical scalability needs
- Identify high correlations between metrics in the vnfs
- Forecast the activity of some of the VNFs (ex.digital tween?)
analysis name | experiment name | involved CDUs | metrics | analysis type | output |
---|---|---|---|---|---|
resource efficiency - broker | cnf_cc_01 | broker | memory_usage, cpu_usage, packets_served | linear regression | regression model, scaterplot |
elasticity efficiency - broker | cnf_cc_01 | broker | memory_usage, cpu_usage, packets_served, scaling_request_timestamp, scaling_completion_timestamp | visualisation | graph with scaling requests and actions |
correlation analysis - broker | cnf_cc_02 | broker, processor | set of resource usage and vnf specific metrics | correlation analysis | correlogram, Rsq, statistical significance |
resource efficiency - mdc | cnf_mdc_01 | mdc | memory_usage, cpu_usage, outgoing_traffic | linear regression | regression model, scaterplot |
time series decomposition - mdc | cnf_mdc_01 | mdc | memory_usage, cpu_usage, incoming_packets | time series decomposition | graph with trend, cycle and seasonality views |
forecasting - mdc | cnf_mdc_01 | mdc | memory_usage, cpu_usage, incoming_packets | forecasting | graph with forecasted values |
time series decomposition - IDS | tbd | ids | memory_usage, cpu_usage, packets_dropped, outgoing_packets | time series decomposition | graph with trend, cycle and seasonality views |
distributed tracing - industry pilot ns | e2e_pilot_01 | broker, processor, prometheus, mdc, eae | distributred tracing library metrics | tracing analysis | bottlenecks identification, tracing diagram |
COMMON AMONG ALL EXPERIMENTS:
* Per container parameters:
* cpu_cores
* cpu_bwandwidth
* max_mem
* ...
* Per container metrics:
* cpu_usage_total_usage
* mem_usage ...
* ....
NOT COMMON: Depends on experiment definition, e.g., which kind of probes are used
* Per experiment for the complete system under test parameters:
* runtime
* ...
* Per experiment for the complete system under test metrics:
* measured_throughput
* ...
NOT COMMON (specific for a given VNF implementation)
* VNF-specific metrics (e.g. a Suricata IDS)
* pkts_matched
* pkts_dropped
* rules_matched
* ...
Legend:
parameter = fixed configuration value (which can be a different value for every experiment)
metric = something that is measured
Taking under consideration Prometheus best practices for naming:
https://prometheus.io/docs/practices/naming/
container name | monitoring parameter | dimensions |
---|---|---|
cname | monitoring parameter | ns_name & experiment_name |
eg. mn_mp_output_vdu01_cpu_stats__online_cpus_int {ns_name:"ns-1vnf-ids-suricata",experiment_name:"suricata_performance"}
Manuel: Yes, Looks reasonable. This is something I can do. Agreed. Note: One experiment, e.g., 'suricata_performance' will have multiple executions and each of them with multiple repetitions, each with different configuration parameters. That can be hundreds or thousands. To the experiment_id is usually something like 'suricata_performance_0098' meaning configuration/repetition number 98 of experiment suricata_performance.
Eleni: i just renamed the dimensions to to ns_name and experiment_name. I do not think that the iteration number(0098) should be depicted at the dimension part because this will result to a very extensive fragmentation of the timeseries data. (each different dimension represents a different timeseries dataset)
Note1: if we will use Prometheus putting the ns_id and experiment_id as a dimension helps to easily querying all timeseries data for a specific network service and/or experiment. Otherwise can be part of the metric name (mn_mp_output_vdu01_cpu_stats__online_cpus_int_ns-1vnf-ids-suricata_suricata_performance
) or skipped (mn_mp_output_vdu01_cpu_stats__online_cpus_int
)
Note2: Some data in csv format can be:
-
existing csv format
- Question: What is the preread and read?
Manuel: I don't know :-D Need to check the Docker doc., I just collect everything Docker gives me. Maybe timestamps of the container images.
- Tip: Columns that contain arrays should be split
Manuel: Yes, for sure. Just didn't had the time to implement this so far. In the version I am preparing for the collaboration this will be the case.
- Tip: Timestamp values should be unique (not repeated within the column values)
- Tip: id column can be removed since timestamp can be used as primary key id
Manuel: Yes, this will be the case in the format we produce for you. The existing formats are for another toolchain.
- Question: What is the preread and read?
Eleni: Great with all tips.Feel free to keep them at the wiki page or delete them
If metrics come for a specific network service and experiment tabular format will be like this:
timestamp | m1 | m2 | m3 |
---|---|---|---|
t1 | value11 | value12 | value13 |
t2 | value21 | value22 | value23 |
... | ... | ... | ... |
tn | valuen1 | valuen2 | valuen3 |
If metrics come for a specific network service and more than one experiments, tabular format will be like this:
(Note4: In case we have to run a profiling analysis upon data that come from different experiments we can only get metrics that are common in all experiments)
Experiment 1:
timestamp | m1 | m2 | m3 |
---|---|---|---|
t1 | value11 | value12 | value13 |
... | ... | ... | ... |
tn | valun31 | valuen2 | valuen3 |
Experiment 2:
timestamp | m1 | m3 | m4 |
---|---|---|---|
tz | valuez1 | valuez3 | valuez4 |
... | ... | ... | ... |
tk | valunk1 | valuek3 | valuek4 |
Result Dataset to be analyzed:
(Note5: m1' & m3' do not include the dimension info so as to be feasible to get matched)
timestamp | m1' | m3' |
---|---|---|
t1 | value11 | value13 |
... | ... | ... |
tk | valunk1 | valuek3 |
Profiler could support both ways of interaction. Analyzing csv files gains in simplicity. Fetching data from Prometheus supports a more sophisticated way for metric values fetching & combination.
Manuel: After some reading I decided to use Prometheus to collect the data. The only thing we need to solve is how I can share those data with you. Because there will no "single Prometheus instance" for everything we all have access to. Maybe just copy/share the files Prometheus writes to the disk. You can then run a own Prometheus instance wich uses this data.
Eleni: we are fine with this option. we can also support a Prometheus instance at our premises with public access so as you can push directly the data. whatever you prefer :-)
For more details see APIs: