Expand benchmark tooling to include memory usage #19464

ivanvc · 2025-02-22T08:35:13Z

What would you like to be added?

The current benchmarking (from tools/rw-heatmaps, tools/benchmark) only measures the r/w operations (QPS). The release v3.6 minor has memory improvements, reducing the memory footprint. However, there's no current tooling to benchmark and report these results.

Why is this needed?

To provide comparisons of the memory usage footprint across different versions.

The text was updated successfully, but these errors were encountered:

ivanvc · 2025-02-22T08:36:16Z

@serathius, @ahrtr mentioned that you can share some ideas on how we can achieve this. Would you be able to do so? Thanks :)

ahrtr · 2025-02-24T11:22:52Z

FYI. https://github.com/ahrtr/etcd-issues/blob/master/docs/troubleshooting/how_to_debug_performance_issue.md#memory

serathius · 2025-02-24T16:47:04Z

Methods I use:

time, most accurate as it returns real max, not some sample. Requires running process yourself https://stackoverflow.com/questions/41205680/how-to-get-the-memory-usage-of-a-os-x-macos-process/41207962#41207962
pidstat, can sample every X seconds.
Prometheus when running in K8s cluster.

fuweid · 2025-02-24T18:26:57Z

If it's running linux, we can create cgroup for that process. It can track memory usage.

ivanvc · 2025-02-27T07:58:11Z

Thanks for your suggestions. I reviewed them, and I think for the sake of simplicity, and that we want to have this really soon, the easiest would be to use time, and get the maximum memory while we run the script from tools/rw-heatmaps. Then, I could generate a new CSV with the memory consumption, which we may feed to the rw-heatmaps (it will need some modifications).

The script would just need modifications here:

etcd/tools/rw-heatmaps/rw-benchmark.sh

Lines 65 to 80 in 49fbd1e

    
           function run_etcd_server() { 
        
             if [ ! -x ${ETCD_BIN} ]; then 
        
               echo "no etcd binary found at: ${ETCD_BIN}" 
        
               exit 1 
        
             fi 
        
             # delete existing data directories 
        
             [ -d "db" ] && rm -rf db 
        
             [ -d "default.etcd" ] && rm -rf default.etcd/ 
        
             echo "start etcd server in the background" 
        
             ${ETCD_BIN} --quota-backend-bytes=${BACKEND_SIZE} \ 
        
               --log-level 'error' \ 
        
               --listen-client-urls http://0.0.0.0:${CLIENT_PORT} \ 
        
               --advertise-client-urls http://127.0.0.1:${CLIENT_PORT} \ 
        
               &>/dev/null & 
        
             return $! 
        
           }

And here:

etcd/tools/rw-heatmaps/rw-benchmark.sh

Lines 97 to 108 in 49fbd1e

    
           function kill_etcd_server() { 
        
             # kill etcd server 
        
             ETCD_PID=$1 
        
             if [ -z "$(ps aux | grep etcd | awk "{print \$2}")" ]; then 
        
               echo "failed to find the etcd instance to kill: ${ETCD_PID}" 
        
               return 
        
             fi 
        
             echo "kill etcd server instance" 
        
             kill -9 ${ETCD_PID} 
        
             wait ${ETCD_PID} 2>/dev/null 
        
             sleep 5 
        
           }

It should send a SIGINT rather than SIGKILL to keep it simple.

Should we capture the CPU usage as well?

%P Percentage of the CPU that this job got, computed as (%U + %S) / %E.

ahrtr · 2025-02-27T11:01:29Z

the easiest would be to use time, and get the maximum memory while we run the script from tools/rw-heatmaps

Either time or pidstat or using prometheus metrics works for me. I think it would be good to monitor/track memory usage over a period of time, so that we can generate line charts against different version for easier comparison. It may need some effort, so it depends on your capacity. Just using time to track the overall memory usage is OK for now. Thanks

Should we capture the CPU usage as well?

Yes, it's nice to have. Track/monitoring over a period of time is better, but it's up to you for now depending on your capacity.

ivanvc · 2025-02-28T07:21:30Z

I was thinking of adding the etcd server memory usage to the rw-heatmaps script to get memory usage along with the R/W performance benchmark. If we take samples for the many passes we are doing, it will be hard to find a visualization for the data.

I think we either:

Add it to the rw-heatmaps benchmark script and store the maximum RAM and CPU percentage. We can reuse the benchmarking script and some changes to the line charts from [RFC] Add rw heatmaps line chart #19030. In this case, time is a good candidate.
Do it as a separate job. Define a new benchmark script that will allow us to get the data (probably something similar to the rw-heatmaps script) but generate fewer data points. Build new visualizations for the memory usage over some time. In this case, pidstat would be a good candidate.

The latter option could also use Prometheus inside a kind cluster or using cgget and cgroups. However, these two options involve a bigger local setup. A point for p8s is that doing the visualization part would be easier, as we could export the charts from a Grafana dashboard.

I will do option 1 (data collection), which is the quickest, and will run some new benchmarks soon. Even with the suggestion from etcd-io/website#959 (comment), it will take two days to generate the benchmarks for versions. But as I wrote in the last paragraph, I think it makes sense to explore the second option if we can use Grafana and don't need to change the rw-heatmaps. In that case, what would be a good set of parameters for the benchmarks?

ivanvc added area/tooling type/feature labels Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand benchmark tooling to include memory usage #19464

Expand benchmark tooling to include memory usage #19464

ivanvc commented Feb 22, 2025

ivanvc commented Feb 22, 2025

ahrtr commented Feb 24, 2025

serathius commented Feb 24, 2025

fuweid commented Feb 24, 2025

ivanvc commented Feb 27, 2025

ahrtr commented Feb 27, 2025

ivanvc commented Feb 28, 2025

Expand benchmark tooling to include memory usage #19464

Expand benchmark tooling to include memory usage #19464

Comments

ivanvc commented Feb 22, 2025

What would you like to be added?

Why is this needed?

ivanvc commented Feb 22, 2025

ahrtr commented Feb 24, 2025

serathius commented Feb 24, 2025

fuweid commented Feb 24, 2025

ivanvc commented Feb 27, 2025

ahrtr commented Feb 27, 2025

ivanvc commented Feb 28, 2025