Skip to content

Benchmarking and Profiling

Manuel Peuster edited this page Apr 8, 2019 · 35 revisions

Benchmarking and Profiling of the Industrial Pilot

This wikipage describes benchmarking and profiling expoeriments that we plan do perform to show the tight coupling between WP4 (benchmarking and profiling) and WP7 (pilot).

Main heads involved:

  • Manuel
  • Stefan
  • Tasos
  • Eleni

Goal to publish to this call.

Benchmarking Experiments

Tools

Overview

Idea is to focus on the CNFs that are relevant for the performance and skip CNFs that are not interesting for benchmarking, e.g., the EAE is more or less a user interface and not relevant for benchmarking experiments. Also the DT is considered to live outside the service in a real setup. Still the DT might server as stimuli probe.

Name Involved CDUs Effort Stimuli Probe Measurem. Probe
cnf_ids_01 suricata_ids low Traffic Traces -
cnf_cc_01 broker low MQTT MQTT
cnf_cc_02 broker, processor medium MQTT MQTT
cnf_cc_03 broker, prom. exporter, prometheus medium MQTT (?)
cnf_cc_04 broker, processor, prom. exporter, prometheus medium MQTT MQTT / (?)
cnf_mdc_01 mdc high SMB MQTT
e2e_pilot_01 modc, rtr, broker, processor, prom. exporter, prometheus high SMB MQTT / (?)

Experiment: cnf_cc_01

Description: Test the MQTT broker as central component of the pilot in isolation. Measure how much messages we can pump through it (per second).

Experiment: cnf_cc_02

Description: Test the CC processor (together with the broker from which the processor gets its data). Measure how much messages it can translate and forward (per second).

Experiment: cnf_cc_03

Description: Test the CC's local storage backend (implemented through Prometheus) that uses the broker as data source. Not fully clear what to measure here, since Prometheus fetches the data in fixed intervals. Maybe we can play with those intervals as one of the config parameters?

Experiment: cnf_cc_04

Description: Test the full CC. Combines cnf_cc_02 and cnf_cc_03. Multiple measurements needed.

Experiment: cnf_mdc_01

Description: Test how fast the MDC can collect Euromap63 data. Building the right traffic generator for this is a bit challenging (the DT might be reused).

Experiment: e2e_pilot_01

End-to-end pilot with all perf. relevant VNFs

Profiling Analysis

Resource Efficiency Analysis

  • Identify VNF Resources Consumption Trends based on Workload Characteristics

Elasticity Efficiency Analysis and Elasticity Policies Formulation

  • Identify VNF horizontally and vertical scalability needs

Correlation Analysis

  • Identify high correlations between metrics in the vnfs

Time Series Decomposition and Forecasting

  • Forecast the activity of some of the VNFs (ex.digital tween?)
analysis name experiment name involved CDUs metrics analysis type output
resource efficiency - broker cnf_cc_01 broker memory_usage, cpu_usage, packets_served linear regression regression model, scaterplot
elasticity efficiency - broker cnf_cc_01 broker memory_usage, cpu_usage, packets_served, scaling_request_timestamp, scaling_completion_timestamp visualisation graph with scaling requests and actions
correlation analysis - broker cnf_cc_02 broker, processor set of resource usage and vnf specific metrics correlation analysis correlogram, Rsq, statistical significance
resource efficiency - mdc cnf_mdc_01 mdc memory_usage, cpu_usage, outgoing_traffic linear regression regression model, scaterplot
time series decomposition - mdc cnf_mdc_01 mdc memory_usage, cpu_usage, incoming_packets time series decomposition graph with trend, cycle and seasonality views
forecasting - mdc cnf_mdc_01 mdc memory_usage, cpu_usage, incoming_packets forecasting graph with forecasted values
time series decomposition - IDS tbd ids memory_usage, cpu_usage, packets_dropped, outgoing_packets time series decomposition graph with trend, cycle and seasonality views
distributed tracing - industry pilot ns e2e_pilot_01 broker, processor, prometheus, mdc, eae distributred tracing library metrics tracing analysis bottlenecks identification, tracing diagram

Data Format

Metrics Types

COMMON AMONG ALL EXPERIMENTS:

* Per container parameters:
    * cpu_cores
    * cpu_bwandwidth
    * max_mem
    * ...
* Per container metrics:
    * cpu_usage_total_usage
    * mem_usage ...
    * ....

NOT COMMON: Depends on experiment definition, e.g., which kind of probes are used

* Per experiment for the complete system under test parameters:
    * runtime
    * ...
* Per experiment for the complete system under test metrics:
    * measured_throughput
    * ...

NOT COMMON (specific for a given VNF implementation)

* VNF-specific metrics (e.g. a Suricata IDS)
    * pkts_matched
    * pkts_dropped
    * rules_matched
    * ...

Legend:
parameter = fixed configuration value (which can be a different value for every experiment)
metric = something that is measured

Metrics Naming

Taking under consideration Prometheus best practices for naming:
https://prometheus.io/docs/practices/naming/

container name monitoring parameter dimensions
cname monitoring parameter ns_name & experiment_name

eg. mn_mp_output_vdu01_cpu_stats__online_cpus_int {ns_name:"ns-1vnf-ids-suricata",experiment_name:"suricata_performance"}

Manuel: Yes, Looks reasonable. This is something I can do. Agreed. Note: One experiment, e.g., 'suricata_performance' will have multiple executions and each of them with multiple repetitions, each with different configuration parameters. That can be hundreds or thousands. To the experiment_id is usually something like 'suricata_performance_0098' meaning configuration/repetition number 98 of experiment suricata_performance.

Eleni: i just renamed the dimensions to to ns_name and experiment_name. I do not think that the iteration number(0098) should be depicted at the dimension part because this will result to a very extensive fragmentation of the timeseries data. (each different dimension represents a different timeseries dataset)

Note1: if we will use Prometheus putting the ns_id and experiment_id as a dimension helps to easily querying all timeseries data for a specific network service and/or experiment. Otherwise can be part of the metric name (mn_mp_output_vdu01_cpu_stats__online_cpus_int_ns-1vnf-ids-suricata_suricata_performance) or skipped (mn_mp_output_vdu01_cpu_stats__online_cpus_int)

Note2: Some data in csv format can be:

  • existing csv format
    • Question: What is the preread and read?
      • Manuel: I don't know :-D Need to check the Docker doc., I just collect everything Docker gives me. Maybe timestamps of the container images.
    • Tip: Columns that contain arrays should be split
      • Manuel: Yes, for sure. Just didn't had the time to implement this so far. In the version I am preparing for the collaboration this will be the case.
    • Tip: Timestamp values should be unique (not repeated within the column values)
    • Tip: id column can be removed since timestamp can be used as primary key id
      • Manuel: Yes, this will be the case in the format we produce for you. The existing formats are for another toolchain.

Eleni: Great with all tips.Feel free to keep them at the wiki page or delete them

Metrics Structure

If metrics come for a specific network service and experiment tabular format will be like this:

timestamp m1 m2 m3
t1 value11 value12 value13
t2 value21 value22 value23
... ... ... ...
tn valuen1 valuen2 valuen3

If metrics come for a specific network service and more than one experiments, tabular format will be like this:
(Note4: In case we have to run a profiling analysis upon data that come from different experiments we can only get metrics that are common in all experiments)

Experiment 1:

timestamp m1 m2 m3
t1 value11 value12 value13
... ... ... ...
tn valun31 valuen2 valuen3

Experiment 2:

timestamp m1 m3 m4
tz valuez1 valuez3 valuez4
... ... ... ...
tk valunk1 valuek3 valuek4

Result Dataset to be analyzed:
(Note5: m1' & m3' do not include the dimension info so as to be feasible to get matched)

timestamp m1' m3'
t1 value11 value13
... ... ...
tk valunk1 valuek3

Integration

Profiler could support both ways of interaction. Analyzing csv files gains in simplicity. Fetching data from Prometheus supports a more sophisticated way for metric values fetching & combination.

Manuel: After some reading I decided to use Prometheus to collect the data. The only thing we need to solve is how I can share those data with you. Because there will no "single Prometheus instance" for everything we all have access to. Maybe just copy/share the files Prometheus writes to the disk. You can then run a own Prometheus instance wich uses this data.

Eleni: we are fine with this option. we can also support a Prometheus instance at our premises with public access so as you can push directly the data. whatever you prefer :-)

For more details see APIs: