Skip to content

Commit 4a7da52

Browse files
authored
Merge pull request #8 from edgerun/update-domain-model
Update domain model
2 parents 1dbe97d + e32a28b commit 4a7da52

39 files changed

+2197
-574
lines changed

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,18 @@ You can run the examples we provide in https://github.com/edgerun/faas-sim/tree/
3030
Where example refers to the specific example package.
3131
Check out the examples [README](https://github.com/edgerun/faas-sim/tree/master/examples/README.md) for more information.
3232

33+
Run notebooks
34+
-------------
35+
36+
Notebooks are located in `notebooks`.
37+
You need to install `faas-sim` in editable state to run the notebooks.
38+
Inside `notebooks` import modules from `sim`.
39+
40+
To install the project (assuming you already created and activated a virtual environment via `make venv`):
41+
42+
pip install -e .
43+
jupyter notebook
44+
3345
Documentation
3446
-------------
3547

doc/analysis/index.rst

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
.. _analysis:
2+
3+
========
4+
Analysis
5+
========
6+
7+
Analysis of simulation results is done by extracting pandas DataFrames upon completion (``sim.env.metrics.extract_dataframe(<name>)``).
8+
The environment of the simulation contains a ``Metrics`` object used throughout the simulation to log events.
9+
Those events describe different aspects of a FaaS platform (``FaasSystem``), such as scheduling process, data flow or invocations.
10+
11+
Default logs
12+
============
13+
14+
The default implementation of a FaasSystem (``DefaultFaasSystem``) logs events of the following processes and can be extracted as dataframe with the associated names:
15+
16+
* Allocation (``'allocation'``)
17+
* Invocations (``'invocations'``)
18+
* Scaling (``'scale'``)
19+
* Scheduling (``'schedule'``)
20+
* Function Replica Deployment (``'replica_deployment'``)
21+
* Function Deployments (``'function_deployments'``)
22+
* Function Deployment (``'function_deployment'``)
23+
* Function Deployment lifecycle (``'function_deployment_lifecycle'``)
24+
* Functions (``'functions'``)
25+
* Flow (``'flow'``)
26+
* Network (``'network'``)
27+
* Node utilization (``'node_utilization'``)
28+
* Function utilization (``'function_utilization'``)
29+
* Function Execution Times (``'fets'``)
30+
31+
.. hint::
32+
33+
We provide a basic example in ``examples/analysis/main.py`` and details for each dataframe can be found in the documentation to the corresponding aspect.
34+
35+
Logging
36+
=======
37+
38+
During the simulation various aspects of the system are being logged.
39+
Logging happens mainly from the core implementation but some aspects are left to the users.
40+
Details about those aspects follow later.
41+
42+
``Metrics`` defines a general log function and different out-of-the-box log functions that target specific events in the lifecycle of a FaaS platform.
43+
44+
45+
The ``Metrics`` constructor takes a ``RuntimeLogger`` object as initialisation parameter.
46+
The *logger* stores all records and can be configured by providing a ``Clock`` object, which determines the time of each log event.
47+
48+
.. hint::
49+
Checkout ``sim.logging`` for different implementations!

doc/concepts/index.rst

Lines changed: 49 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,47 @@ Think of it like the main API gateway of OpenFaaS or the kube-apiserver of Kuber
8282
8383
def remove(self, fn: FunctionDeployment): ...
8484
85+
def suspend(self, fn_name: str): ...
86+
8587
def discover(self, fn_name: str) -> List[FunctionReplica]: ...
8688
8789
def scale_down(self, fn_name: str, remove: int): ...
8890
8991
def scale_up(self, fn_name: str, replicas: int): ...
9092
91-
# ... and several other lookup methods
93+
# additional lookup methods:
94+
def poll_available_replica(self, fn: str, interval=0.5): ...
95+
96+
def get_replicas(self, fn_name: str, state=None) -> List[FunctionReplica]: ...
97+
98+
def get_function_index(self) -> Dict[str, FunctionContainer]: ...
99+
100+
def get_deployments(self) -> List[FunctionDeployment]: ...
101+
102+
Conceptually the phases are:
103+
104+
* **deploy**: makes the function invokable and deploys the minimum number of ``FunctionReplica`` instances on the cluster. The number of minimum running instances is configured via ``ScalingConfiguration``.
105+
106+
* **invoke**: the ``LoadBalancer`` selects a replica and simulates the function invocation by calling the ``invoke`` method of the associated ``FunctionSimulator``.
107+
* **remove**: removes the function from the platform and shutsdown all running replias.
108+
109+
* **discover**: returns all running ``FunctionReplica`` instances that belong to the function.
110+
111+
* **scale_down**: removes the specified number of running ``FunctionReplica`` instances, with respect to the minimum requirement. The current implementation picks the most recent deployed replicas first.
112+
113+
* **scale_up**: deploys the specified number of ``FunctionReplica`` instances but has to respect the maximum number specified in the ``ScalingConfiguration``.
114+
115+
* **suspend**: executes a teardown for all running replicas of a function. (used by ``faas_idler``).
116+
117+
* **poll_available_replica**: repeatedly waits and checks for running replicas of the function.
118+
119+
* **get_replicas**: gets all replicas in the specific state of a function. Returns all replicas in case of ``state == None``.
120+
121+
* **get_function_index**: returns all deployed ``FunctionContainers``.
122+
123+
* **get_deployments**: returns all deployed ``FunctionDeployment`` instances.
124+
125+
.. _Function Simulators:
92126

93127
Function simulators
94128
===================
@@ -101,19 +135,19 @@ The FunctionSimulator methods are invoked by the simulator to simulate the the d
101135
.. code-block:: python
102136
103137
class FunctionSimulator(abc.ABC):
104-
138+
105139
def deploy(self, env: Environment, replica: FunctionReplica):
106140
yield env.timeout(0)
107-
141+
108142
def startup(self, env: Environment, replica: FunctionReplica):
109143
yield env.timeout(0)
110-
144+
111145
def setup(self, env: Environment, replica: FunctionReplica):
112146
yield env.timeout(0)
113-
147+
114148
def invoke(self, env: Environment, replica: FunctionReplica, request: FunctionRequest):
115149
yield env.timeout(0)
116-
150+
117151
def teardown(self, env: Environment, replica: FunctionReplica):
118152
yield env.timeout(0)
119153
@@ -133,6 +167,7 @@ Conceptually the phases are:
133167
Each time the simulator creates a new function replica (because of deployment or scaling actions), the SimulatorFactory is called to create or return a FunctionSimulator for that replica.
134168
The SimulatorFactory can be overwritten to return the same FunctionSimulator every time, create a new instance for each function replica, or any other behavior.
135169

170+
Get more details on function simulators in :ref:`Function Simulator Details` and our examples.
136171

137172
Simulation
138173
==========
@@ -166,7 +201,7 @@ Usage example:
166201
.. code-block:: python
167202
168203
from sim.requestgen import expovariate_arrival_profile, constant_rps_profile
169-
204+
170205
env = ...
171206
gen = expovariate_arrival_profile(constant_rps_profile(20))
172207
@@ -176,7 +211,7 @@ Usage example:
176211
# send next request
177212
178213
179-
.. TODO: upload an example and a cleaned up version of workload_patterns.ipynb
214+
180215
181216
The following figure shows several examples and the request patterns the produce:
182217

@@ -196,3 +231,9 @@ The second row shows how a constant interarrival distribution can be used to mod
196231
and how a constant workload profile can be used to model a static workload pattern with randomized interarrivals.
197232
The last row shows Gaussian random walks (GRW), where each value represents a random sample from a Normal distribution, that is then used as value for :math:`\mu` in the next random sample.
198233
The request profile can be parameterized with a :math:`\sigma` value that affects the fluctuation over time.
234+
235+
.. hint::
236+
237+
You can find code examples to generate patterns in our Jupyter Notebook (``workload_patterns.ipynb``) and a
238+
simulation example under ``examples/request_gen``.
239+

doc/contents.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ Documentation for faas-sim
1313

1414
index
1515
concepts/index
16+
system/index
17+
analysis/index
18+
function_sims/index
1619
examples/index
1720

1821
Indices and tables
67 KB
Loading
41.7 KB
Loading

doc/function_sims/index.rst

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
.. _Function Simulator Details:
2+
3+
====================
4+
Function Simulators
5+
====================
6+
7+
This section is dedicated to showcase a selection of pre-defined function simulators and gives details on how to implement one yourself.
8+
9+
.. attention::
10+
11+
Make sure you've familiarized yourself with :ref:`Resources` and :ref:`Function Simulators`.
12+
13+
As our work is heavily influenced by the design and architecture of `OpenFaaS`_, we provide two implementations of `FunctionSimulator` that model the behavior of *forking* and *HTTP* modes (see `Watchdog modes`_).
14+
15+
The implementations are located in ``sim/faas/watchdogs.py`` and can be imported with:
16+
17+
.. code-block:: python
18+
19+
from sim.faas import ForkingWatchdog, HTTPWatchdog
20+
21+
The abstract class that represents the general Watchdog concept looks like this:
22+
23+
.. code-block:: python
24+
25+
class Watchdog(FunctionSimulator):
26+
27+
def claim_resources(self, env: Environment, replica: FunctionReplica, request: FunctionRequest): ...
28+
29+
def release_resources(self, env: Environment, replica: FunctionReplica, request: FunctionRequest): ...
30+
31+
def execute(self, env: Environment, replica: FunctionReplica, request: FunctionRequest): ...
32+
33+
34+
The ``HTTPWatchdog`` uses a queuing mechanism to simulate works and claims resources after the request received a token (i.e., a worker is available).
35+
The ``ForkingWatchdog`` claims and executes each request immediately without further delay.
36+
37+
.. attention::
38+
39+
When using the ``ForkingWatchdog`` make sure that you limit manually the requests due to RAM usage for each fork.
40+
41+
The following figure shows the log events that happen during the execution with the ``HTTPWatchdog`` and also depicts the interaction between different system components.
42+
43+
.. figure:: ../figures/functionsim-invoke-times.png
44+
:align: center
45+
46+
47+
.. _OpenFaaS: https://docs.openfaas.com/
48+
.. _Watchdog modes: https://github.com/openfaas/of-watchdog#modes
49+

doc/system/index.rst

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
.. _system:
2+
3+
========
4+
System
5+
========
6+
7+
In the following we describe the inner workings of our *FaasSystem* implementation.
8+
The API of the `FaasSystem` is designed around real life requirements and represents typical operations that can be found in a typical API Gateway (such as in `OpenFaaS`_).
9+
We provide a default implementation of ``FaasSystem``, called ``DefaultFaasSystme`` in ``sim.faas.system.py``.
10+
The following explains the inner workings of our implementations, which components are used and how you can configure the system.
11+
12+
We recall the methods a ``FaasSystem`` has to implement:
13+
14+
.. code-block:: python
15+
16+
class FaasSystem(abc.ABC):
17+
18+
def deploy(self, fn: FunctionDeployment): ...
19+
20+
def invoke(self, request: FunctionRequest): ...
21+
22+
def remove(self, fn: FunctionDeployment): ...
23+
24+
def discover(self, fn_name: str) -> List[FunctionReplica]: ...
25+
26+
def scale_down(self, fn_name: str, remove: int): ...
27+
28+
def scale_up(self, fn_name: str, replicas: int): ...
29+
30+
def suspend(self, fn_name: str): ...
31+
32+
# ... and several other lookup methods
33+
34+
To implement these functions, our system contains the following state:
35+
36+
.. attention::
37+
38+
This section provides insights into the current implementation of ``FaasSystem``.
39+
Be aware that this is subject to change and using lookup methods is much safer with respect to updates.
40+
41+
42+
* ``env: Environment``: used to access global configured components (i.e., ``Metrics``, ``SimulatorFactory``, ``ClusterContext``)
43+
* ``function_containers: Dict[str, FunctionContainer]``: stores all available function containers from the deployed functions
44+
* ``replicas: Dict[str, List[FunctionReplica]``: collects all FunctionReplicas under the name of the corresponding FunctionDeployment
45+
* ``scheduler_queue: simpy.Store``: contains function replicas that need to be scheduled. ``scale_up`` puts replicas into the queue and ``run_schedule_worker`` polls from it.
46+
* ``load_balancer: LoadBalancer``: called upon ``invoke`` to select replica that handles the invocation. (currently round-robin)
47+
* ``functions_deployments: Dict[str, FunctionDeployment``: stores the deployed functions and gets modified by ``deploy`` and ``remove``.
48+
* ``replica_count: Dict[str, int]``: counts the number of active replica per ``FunctionDeployment``
49+
* ``functions_definitions: Counter``: counts the number of replica per ``FunctionContainer``
50+
51+
.. _OpenFaaS: https://docs.openfaas.com/
52+
53+
54+
.. _Resources:
55+
56+
Resources
57+
=========
58+
59+
Simulation of resources has to be implemented by users due to necessary flexibility regarding the implementation of a ``FunctionSimulator``. In example, the execution of a function can be delayed through queuing.
60+
Therefore, resources are not immediately used and it's the ``FunctionSimulator's`` responsibility to consume them at the right time.
61+
62+
*faas-sim* offers a standardized interface to manage resources which is based on dictionaries.
63+
This allows *faas-sim* to implement common componnents (such as resource monitoring for nodes & functions, as well as an implementation of `Kubernetes' HPA`_)
64+
Resources get added up.
65+
66+
The following code shows an example on consuming resources:
67+
68+
.. code-block:: python
69+
70+
class CpuConsumingSim(FunctionSimulator):
71+
72+
def __init__(self, queue: simpy.Resource):
73+
self.queue = queue
74+
75+
def invoke(self, env: Environment, replica: FunctionReplica, request: FunctionRequest):
76+
token = self.queue.request()
77+
yield token
78+
79+
# definition of resources is up to users
80+
# here we assume that a function call needs 20% cpu usage of the whole call
81+
env.resource_state.put_resource(replica, 'cpu', 0.2)
82+
83+
yield env.timeout(1)
84+
85+
# release resources
86+
env.resource_state.remove_resource(replica, 'cpu', 0.2)
87+
88+
89+
The ``Environment`` object contains a resource monitor which continuously collects the momentary resource utilization and puts into the ``MetricsServer`` which can be used to query the average usage of a certain resource.
90+
91+
.. _Kubernetes' HPA: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
92+

examples/analysis/__init__.py

Whitespace-only changes.

examples/analysis/main.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import logging
2+
3+
import examples.basic.main as basic
4+
from examples.custom_function_sim.main import CustomSimulatorFactory
5+
from sim.faassim import Simulation
6+
7+
logger = logging.getLogger(__name__)
8+
9+
10+
def main():
11+
logging.basicConfig(level=logging.INFO)
12+
13+
# prepare simulation with topology and benchmark from basic example
14+
sim = Simulation(basic.example_topology(), basic.ExampleBenchmark())
15+
16+
# override the SimulatorFactory factory
17+
sim.create_simulator_factory = CustomSimulatorFactory
18+
19+
# run the simulation
20+
sim.run()
21+
22+
dfs = {
23+
'allocation_df': sim.env.metrics.extract_dataframe('allocation'),
24+
'invocations_df': sim.env.metrics.extract_dataframe('invocations'),
25+
'scale_df': sim.env.metrics.extract_dataframe('scale'),
26+
'schedule_df': sim.env.metrics.extract_dataframe('schedule'),
27+
'replica_deployment_df': sim.env.metrics.extract_dataframe('replica_deployment'),
28+
'function_deployments_df': sim.env.metrics.extract_dataframe('function_deployments'),
29+
'function_deployment_df': sim.env.metrics.extract_dataframe('function_deployment'),
30+
'function_deployment_lifecycle_df': sim.env.metrics.extract_dataframe('function_deployment_lifecycle'),
31+
'functions_df': sim.env.metrics.extract_dataframe('functions'),
32+
'flow_df': sim.env.metrics.extract_dataframe('flow'),
33+
'network_df': sim.env.metrics.extract_dataframe('network'),
34+
'node_utilization_df': sim.env.metrics.extract_dataframe('node_utilization'),
35+
'function_utilization_df': sim.env.metrics.extract_dataframe('function_utilization'),
36+
'fets_df': sim.env.metrics.extract_dataframe('fets')
37+
}
38+
39+
logger.info('Mean exec time %d', dfs['invocations_df']['t_exec'].mean())
40+
41+
42+
if __name__ == '__main__':
43+
main()

0 commit comments

Comments
 (0)