Regarding SIMD performance #114
Replies: 5 comments
-
|
Hi Paul,
The best way to measure the speed/throughput is to look at the To identify any improvements or regressions in solution quality, each configuration has be benchmarked many times (>25x) and averaged. Hope this helps you with solving the Christmas Tree Packing Challenge on Kaggle ;) Best, |
Beta Was this translation helpful? Give feedback.
-
|
Hi Jeroen, Thank you for quick response. Yes, I'm aware of non-deterministic performance, probably amplified by not using the same seed value for each run. I've run the tests a few times now, and there is SIMD improvement on average. I was trying to find an imperfect performance metric. The Thanks, |
Beta Was this translation helpful? Give feedback.
-
|
I have the measurements of There is only 30% speedup with SIMD on this system. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Paul, This look about what I would expect. The only thing the Also, the compiler does auto-vectorization on its own. Best, Just a tip: from my experience, using >4 workers yields worse solutions on average. |
Beta Was this translation helpful? Give feedback.
-
|
Here are my results with 41 trees, 1-hour run with
Surprisingly, 18 workers produced both the best and the worst result, 12 workers did better than 9, but this is based only on a single run for each configuration. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I've run a few tests on an AMD Ryzen 7 3800X 8-core CPU and observed a decreased performance with SIMD build. Here are the results for the same input file trees.json, demanding 41 copies of a simple non-convex polygon. All runs were with 7 workers and early termination disabled.
spyrrowspyrrow--release --features=only_final_svg--release --features=simd,only_final_svgFor some reason, SIMD version produced the worst result. I successfully installed the nightly build and configured my session with:
Is there a way to double-check that SIMD instructions were actually used? My CPU supports up to AVX2 instructions.
Beta Was this translation helpful? Give feedback.
All reactions