Benchmarking and Profiling

Benchmarking and Profiling of the Industrial Pilot

This wikipage describes benchmarking and profiling expoeriments that we plan do perform to show the tight coupling between WP4 (benchmarking and profiling) and WP7 (pilot).

Main heads involved:

Manuel
Stefan
Tasos
Eleni

Goal to publish to this call.

Benchmarking Experiments

Tools

MQTT Load Testing with mqtt-malaria

Overview

Idea is to focus on the CNFs that are relevant for the performance and skip CNFs that are not interesting for benchmarking, e.g., the EAE is more or less a user interface and not relevant for benchmarking experiments. Also the DT is considered to live outside the service in a real setup. Still the DT might server as stimuli probe.

Name	Involved CDUs	Effort	Stimuli Probe	Measurem. Probe
cnf_ids_01	suricata_ids	low	Traffic Traces	-
cnf_cc_01	broker	low	MQTT	MQTT
cnf_cc_02	broker, processor	medium	MQTT	MQTT
cnf_cc_03	broker, prom. exporter, prometheus	medium	MQTT	(?)
cnf_cc_04	broker, processor, prom. exporter, prometheus	medium	MQTT	MQTT / (?)
cnf_mdc_01	mdc	high	SMB	MQTT
e2e_pilot_01	modc, rtr, broker, processor, prom. exporter, prometheus	high	SMB	MQTT / (?)

Experiment: cnf_cc_01

Description: Test the MQTT broker as central component of the pilot in isolation. Measure how much messages we can pump through it (per second).

Experiment: cnf_cc_02

Description: Test the CC processor (together with the broker from which the processor gets its data). Measure how much messages it can translate and forward (per second).

Experiment: cnf_cc_03

Description: Test the CC's local storage backend (implemented through Prometheus) that uses the broker as data source. Not fully clear what to measure here, since Prometheus fetches the data in fixed intervals. Maybe we can play with those intervals as one of the config parameters?

Experiment: cnf_cc_04

Description: Test the full CC. Combines cnf_cc_02 and cnf_cc_03. Multiple measurements needed.

Experiment: cnf_mdc_01

Description: Test how fast the MDC can collect Euromap63 data. Building the right traffic generator for this is a bit challenging (the DT might be reused).

Experiment: e2e_pilot_01

End-to-end pilot with all perf. relevant VNFs

Profiling Analysis

Resource Efficiency Analysis

Identify VNF Resources Consumption Trends based on Workload Characteristics

Elasticity Efficiency Analysis and Elasticity Policies Formulation

Identify VNF horizontally and vertical scalability needs

Correlation Analysis

Identify high correlations between metrics in the vnfs

Time Series Decomposition and Forecasting

Forecast the activity of some of the VNFs (ex.digital tween?)

analysis name	experiment name	involved CDUs	metrics	analysis type	output
resource efficiency - broker	cnf_cc_01	broker	memory_usage, cpu_usage, packets_served	linear regression	regression model, scaterplot
elasticity efficiency - broker	cnf_cc_01	broker	memory_usage, cpu_usage, packets_served, scaling_request_timestamp, scaling_completion_timestamp	visualisation	graph with scaling requests and actions
correlation analysis - broker	cnf_cc_02	broker, processor	set of resource usage and vnf specific metrics	correlation analysis	correlogram, Rsq, statistical significance
resource efficiency - mdc	cnf_mdc_01	mdc	memory_usage, cpu_usage, outgoing_traffic	linear regression	regression model, scaterplot
time series decomposition - mdc	cnf_mdc_01	mdc	memory_usage, cpu_usage, incoming_packets	time series decomposition	graph with trend, cycle and seasonality views
forecasting - mdc	cnf_mdc_01	mdc	memory_usage, cpu_usage, incoming_packets	forecasting	graph with forecasted values
time series decomposition - IDS	tbd	ids	memory_usage, cpu_usage, packets_dropped, outgoing_packets	time series decomposition	graph with trend, cycle and seasonality views
distributed tracing - industry pilot ns	e2e_pilot_01	broker, processor, prometheus, mdc, eae	distributred tracing library metrics	tracing analysis	bottlenecks identification, tracing diagram

Data Format

Metrics Types

COMMON AMONG ALL EXPERIMENTS:

* Per container parameters:
    * cpu_cores
    * cpu_bwandwidth
    * max_mem
    * ...
* Per container metrics:
    * cpu_usage_total_usage
    * mem_usage ...
    * ....

NOT COMMON: Depends on experiment definition, e.g., which kind of probes are used

* Per experiment for the complete system under test parameters:
    * runtime
    * ...
* Per experiment for the complete system under test metrics:
    * measured_throughput
    * ...

NOT COMMON (specific for a given VNF implementation)

* VNF-specific metrics (e.g. a Suricata IDS)
    * pkts_matched
    * pkts_dropped
    * rules_matched
    * ...

Legend:
parameter = fixed configuration value (which can be a different value for every experiment)
metric = something that is measured

Metrics Naming

Taking under consideration Prometheus best practices for naming:
https://prometheus.io/docs/practices/naming/

container name	monitoring parameter	dimensions
cname	monitoring parameter	ns_name & experiment_name

eg. mn_mp_output_vdu01_cpu_stats__online_cpus_int {ns_name:"ns-1vnf-ids-suricata",experiment_name:"suricata_performance"}

Manuel: Yes, Looks reasonable. This is something I can do. Agreed. Note: One experiment, e.g., 'suricata_performance' will have multiple executions and each of them with multiple repetitions, each with different configuration parameters. That can be hundreds or thousands. To the experiment_id is usually something like 'suricata_performance_0098' meaning configuration/repetition number 98 of experiment suricata_performance.

Eleni: i just renamed the dimensions to to ns_name and experiment_name. I do not think that the iteration number(0098) should be depicted at the dimension part because this will result to a very extensive fragmentation of the timeseries data. (each different dimension represents a different timeseries dataset)

Note1: if we will use Prometheus putting the ns_id and experiment_id as a dimension helps to easily querying all timeseries data for a specific network service and/or experiment. Otherwise can be part of the metric name (mn_mp_output_vdu01_cpu_stats__online_cpus_int_ns-1vnf-ids-suricata_suricata_performance) or skipped (mn_mp_output_vdu01_cpu_stats__online_cpus_int)

Note2: Some data in csv format can be:

existing csv format
- Question: What is the preread and read?
  - Manuel: I don't know :-D Need to check the Docker doc., I just collect everything Docker gives me. Maybe timestamps of the container images.
- Tip: Columns that contain arrays should be split
  - Manuel: Yes, for sure. Just didn't had the time to implement this so far. In the version I am preparing for the collaboration this will be the case.
- Tip: Timestamp values should be unique (not repeated within the column values)
- Tip: id column can be removed since timestamp can be used as primary key id
  - Manuel: Yes, this will be the case in the format we produce for you. The existing formats are for another toolchain.

Eleni: Great with all tips.Feel free to keep them at the wiki page or delete them

Metrics Structure

If metrics come for a specific network service and experiment tabular format will be like this:

timestamp	m1	m2	m3
t1	value11	value12	value13
t2	value21	value22	value23
...	...	...	...
tn	valuen1	valuen2	valuen3

If metrics come for a specific network service and more than one experiments, tabular format will be like this:
(Note4: In case we have to run a profiling analysis upon data that come from different experiments we can only get metrics that are common in all experiments)

Experiment 1:

timestamp	m1	m2	m3
t1	value11	value12	value13
...	...	...	...
tn	valun31	valuen2	valuen3

Experiment 2:

timestamp	m1	m3	m4
tz	valuez1	valuez3	valuez4
...	...	...	...
tk	valunk1	valuek3	valuek4

Result Dataset to be analyzed:
(Note5: m1' & m3' do not include the dimension info so as to be feasible to get matched)

timestamp	m1'	m3'
t1	value11	value13
...	...	...
tk	valunk1	valuek3

Integration

Profiler could support both ways of interaction. Analyzing csv files gains in simplicity. Fetching data from Prometheus supports a more sophisticated way for metric values fetching & combination.

Manuel: After some reading I decided to use Prometheus to collect the data. The only thing we need to solve is how I can share those data with you. Because there will no "single Prometheus instance" for everything we all have access to. Maybe just copy/share the files Prometheus writes to the disk. You can then run a own Prometheus instance wich uses this data.

Eleni: we are fine with this option. we can also support a Prometheus instance at our premises with public access so as you can push directly the data. whatever you prefer :-)

For more details see APIs:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly