Skip to content

Commit 0204b6a

Browse files
authored
Fio drop kernel cache (cloud-bulldozer#378)
* add kernel cache drop automation to ripsaw * make kernel cache drop port configurable * Use K8S python API to wait for DaemonSet to be in running state * copy python script to wait for daemonset * common code to start kernel cache dropper pod * only invoke role when user wants kernel cache dropping * automates starting up kernel cache dropper DaemonSet * let benchmark-operator query nodes for labels * pycodestyle likes this version * move wait_for_daemonset.py to role where it is used * add documentation for kernel cache dropper * moved to roles/kernel_cache_drop * fix markdown, add implementation notes and labeling * try cache dropper in valid_fiod_hostpath.yaml * added to CRD * test must label cache dropper nodes * authorize benchmark operator to query nodes * do not set label if it is already set * cachedrop must use benchmark-operator ServiceAccount * Append UUID and test-name to kernel-cache-drop daemonset * Moving update operator image to after for loop * Update test.sh * display security policy * with cache dropping, tests will take longer * try to match apiGroup name in error message * CI fixes
1 parent 58a0a7d commit 0204b6a

File tree

17 files changed

+299
-4
lines changed

17 files changed

+299
-4
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,9 @@ spec:
103103
## Capturing Prometheus Data
104104
[Capturing Prometheus Data](docs/prometheus.md)
105105

106+
## Cache dropping
107+
[Cache dropping](docs/cache_dropping.md)
108+
106109
## Community
107110
Key Members(slack_usernames): ravi, mohit, dry923, rsevilla or rook
108111
* [**#sig-scalability on Kubernetes Slack**](https://kubernetes.slack.com)

build/Dockerfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ RUN ansible-galaxy collection install -r ${HOME}/requirements.yml \
88
COPY image_resources/centos8-appstream.repo /etc/yum.repos.d/centos8-appstream.repo
99
RUN dnf install -y --nodocs redis openssl --enablerepo=centos8-appstream && dnf clean all
1010

11+
COPY resources/kernel-cache-drop-daemonset.yaml /opt/kernel_cache_dropper/
1112
COPY group_vars/ ${HOME}/group_vars/
1213
COPY roles/ ${HOME}/roles/
1314
COPY templates/ ${HOME}/templates/

deploy/20_role.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ rules:
99
- ""
1010
resources:
1111
- pods
12+
- daemonsets
1213
- services
1314
- endpoints
1415
- persistentvolumeclaims

deploy/40_cluster_role_kubevirt.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,23 @@ rules:
1010
- rolebindings
1111
- clusterroles
1212
- clusterrolebindings
13+
- daemonsets
1314
verbs:
1415
- '*'
16+
- apiGroups:
17+
- apps
18+
resources:
19+
- daemonsets
20+
verbs:
21+
- get
22+
- list
23+
- apiGroups:
24+
- rbac.authorization.k8s.io
25+
resources:
26+
- nodes
27+
verbs:
28+
- get
29+
- list
1530
- apiGroups:
1631
- subresources.kubevirt.io
1732
resources:

docs/cache_dropping.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
This page describes how to use cache-dropping features of benchmark-operator.
2+
This feature is totally optional and if you do not specify it in the CR, then it will not happen.
3+
4+
# why drop cache
5+
6+
Cache-dropping prevents previous state of system from affecting performance results, and this helps you
7+
to achieve repeatable, accurate results with low variance.
8+
9+
Caching is an important part of any system's performance, and it is of course desirable in some cases
10+
to explicitly make use of caching. However, this cache-dropping feature does not prevent testing of
11+
caching performance - if you run a long enough test for the cache to "warm up", you can do this even
12+
with cache-dropping enabled, since cache-dropping only happens before each sample, not in the middle of
13+
a sample.
14+
15+
If you want to ensure that caching will not happen during your test, you can create a data set that
16+
is much bigger than the amount of memory available for caching, and use a uniform random access pattern.
17+
18+
# how to drop cache
19+
20+
There are different types of caching that occur in the system
21+
22+
- kernel buffer caching
23+
- (Ceph OCS) OSD caching (not yet supported fully)
24+
25+
you can control which type of cache dropping
26+
is done using these CR fields in the workload args section:
27+
28+
```
29+
drop_cache_kernel: true
30+
```
31+
32+
For this to work, you must **label** the nodes that you want to drop kernel cache, for example:
33+
34+
```
35+
# kubectl label node minikube kernel-cache-dropper=yes
36+
```
37+
If you do not do this, ripsaw will timeout waiting for cache dropper pods to deploy.
38+
39+
40+
# implementation notes
41+
42+
kernel cache dropping is done by a daemonset run on nodes with the above label. See roles/kernel_cache_drop
43+
for details on how this is done. Each pod started by this daemonset is running a CherryPy web service that
44+
responds to a GET URL by dropping kernel cache using equivalent of shell commnand:
45+
46+
```
47+
sync
48+
echo 3 > /proc/sys/vm/drop_caches
49+
```
50+
51+
The sync is required because the kernel cannot drop cache on dirty pages.
52+
A logfile named /tmp/dropcache.log is visible on every cache dropper pod so you can see what it's doing
53+

group_vars/all.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
---
22
operator_namespace: '{{ meta.namespace }}'
33
clustername: 'myk8scluster'
4+
kernel_cache_drop_svc_port: 9222

playbook.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@
44
gather_facts: no
55
tasks:
66

7+
- include_role:
8+
name: "kernel_cache_drop"
9+
when: workload.args.drop_cache_kernel is defined
10+
711
- name: Get update from Cerberus if connected
812
block:
913
- include_role:

resources/crds/ripsaw_v1alpha1_ripsaw_crd.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,8 @@ spec:
101101
type: string
102102
hostpath:
103103
type: string
104+
drop_cache_kernel:
105+
type: boolean
104106
status:
105107
type: object
106108
properties:
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
apiVersion: rbac.authorization.k8s.io/v1
2+
kind: ClusterRole
3+
metadata:
4+
name: benchmark-operator
5+
rules:
6+
- apiGroups:
7+
- ''
8+
resources:
9+
- nodes
10+
- pods
11+
verbs:
12+
- get
13+
- list
14+
- apiGroups:
15+
- apps
16+
resources:
17+
- daemonsets
18+
verbs:
19+
- get
20+
- list
21+
---
22+
apiVersion: rbac.authorization.k8s.io/v1
23+
kind: ClusterRoleBinding
24+
metadata:
25+
name: benchmark-operator
26+
roleRef:
27+
apiGroup: rbac.authorization.k8s.io
28+
kind: ClusterRole
29+
name: benchmark-operator
30+
subjects:
31+
- kind: ServiceAccount
32+
name: benchmark-operator
33+
namespace: my-ripsaw
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
apiVersion: apps/v1
2+
kind: DaemonSet
3+
metadata:
4+
name: kernel-cache-dropper
5+
namespace: my-ripsaw
6+
labels:
7+
app: kernel-cache-dropper
8+
spec:
9+
selector:
10+
matchLabels:
11+
name: kernel-cache-dropper
12+
template:
13+
metadata:
14+
labels:
15+
name: kernel-cache-dropper
16+
spec:
17+
serviceAccountName: benchmark-operator
18+
#tolerations:
19+
#- key: node-role.kubernetes.io/master
20+
# effect: NoSchedule
21+
containers:
22+
- name: kernel-cache-dropper
23+
image: quay.io/cloud-bulldozer/kernel_cache_dropper:latest
24+
imagePullPolicy: Always
25+
ports:
26+
- containerPort: {{ kernel_cache_drop_svc_port }}
27+
env:
28+
- name: KCACHE_DROP_PORT_NUM
29+
value: "{{ kernel_cache_drop_svc_port }}"
30+
command: ["/usr/bin/python3"]
31+
args: ["/opt/kernel_cache_drop/kernel-cache-drop-websvc.py"]
32+
imagePullPolicy: Always
33+
securityContext:
34+
privileged: true
35+
# we don't need all the same volumes as the toolbox pod
36+
volumeMounts:
37+
- name: proc-sys-vm
38+
mountPath: /proc_sys_vm
39+
volumes:
40+
- name: proc-sys-vm
41+
hostPath:
42+
path: /proc/sys/vm

0 commit comments

Comments
 (0)