Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand benchmark tooling to include memory usage #19464

Open
ivanvc opened this issue Feb 22, 2025 · 7 comments
Open

Expand benchmark tooling to include memory usage #19464

ivanvc opened this issue Feb 22, 2025 · 7 comments

Comments

@ivanvc
Copy link
Member

ivanvc commented Feb 22, 2025

What would you like to be added?

The current benchmarking (from tools/rw-heatmaps, tools/benchmark) only measures the r/w operations (QPS). The release v3.6 minor has memory improvements, reducing the memory footprint. However, there's no current tooling to benchmark and report these results.

Why is this needed?

To provide comparisons of the memory usage footprint across different versions.

@ivanvc
Copy link
Member Author

ivanvc commented Feb 22, 2025

@serathius, @ahrtr mentioned that you can share some ideas on how we can achieve this. Would you be able to do so? Thanks :)

@serathius
Copy link
Member

Methods I use:

@fuweid
Copy link
Member

fuweid commented Feb 24, 2025

If it's running linux, we can create cgroup for that process. It can track memory usage.

@ivanvc
Copy link
Member Author

ivanvc commented Feb 27, 2025

Thanks for your suggestions. I reviewed them, and I think for the sake of simplicity, and that we want to have this really soon, the easiest would be to use time, and get the maximum memory while we run the script from tools/rw-heatmaps. Then, I could generate a new CSV with the memory consumption, which we may feed to the rw-heatmaps (it will need some modifications).

The script would just need modifications here:

function run_etcd_server() {
if [ ! -x ${ETCD_BIN} ]; then
echo "no etcd binary found at: ${ETCD_BIN}"
exit 1
fi
# delete existing data directories
[ -d "db" ] && rm -rf db
[ -d "default.etcd" ] && rm -rf default.etcd/
echo "start etcd server in the background"
${ETCD_BIN} --quota-backend-bytes=${BACKEND_SIZE} \
--log-level 'error' \
--listen-client-urls http://0.0.0.0:${CLIENT_PORT} \
--advertise-client-urls http://127.0.0.1:${CLIENT_PORT} \
&>/dev/null &
return $!
}

And here:

function kill_etcd_server() {
# kill etcd server
ETCD_PID=$1
if [ -z "$(ps aux | grep etcd | awk "{print \$2}")" ]; then
echo "failed to find the etcd instance to kill: ${ETCD_PID}"
return
fi
echo "kill etcd server instance"
kill -9 ${ETCD_PID}
wait ${ETCD_PID} 2>/dev/null
sleep 5
}

It should send a SIGINT rather than SIGKILL to keep it simple.

Should we capture the CPU usage as well?

%P Percentage of the CPU that this job got, computed as (%U + %S) / %E.

@ahrtr
Copy link
Member

ahrtr commented Feb 27, 2025

the easiest would be to use time, and get the maximum memory while we run the script from tools/rw-heatmaps

Either time or pidstat or using prometheus metrics works for me. I think it would be good to monitor/track memory usage over a period of time, so that we can generate line charts against different version for easier comparison. It may need some effort, so it depends on your capacity. Just using time to track the overall memory usage is OK for now. Thanks

Should we capture the CPU usage as well?

Yes, it's nice to have. Track/monitoring over a period of time is better, but it's up to you for now depending on your capacity.

@ivanvc
Copy link
Member Author

ivanvc commented Feb 28, 2025

I was thinking of adding the etcd server memory usage to the rw-heatmaps script to get memory usage along with the R/W performance benchmark. If we take samples for the many passes we are doing, it will be hard to find a visualization for the data.

I think we either:

  1. Add it to the rw-heatmaps benchmark script and store the maximum RAM and CPU percentage. We can reuse the benchmarking script and some changes to the line charts from [RFC] Add rw heatmaps line chart #19030. In this case, time is a good candidate.
  2. Do it as a separate job. Define a new benchmark script that will allow us to get the data (probably something similar to the rw-heatmaps script) but generate fewer data points. Build new visualizations for the memory usage over some time. In this case, pidstat would be a good candidate.

The latter option could also use Prometheus inside a kind cluster or using cgget and cgroups. However, these two options involve a bigger local setup. A point for p8s is that doing the visualization part would be easier, as we could export the charts from a Grafana dashboard.

I will do option 1 (data collection), which is the quickest, and will run some new benchmarks soon. Even with the suggestion from etcd-io/website#959 (comment), it will take two days to generate the benchmarks for versions. But as I wrote in the last paragraph, I think it makes sense to explore the second option if we can use Grafana and don't need to change the rw-heatmaps. In that case, what would be a good set of parameters for the benchmarks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants