You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We’ve been experimenting with CGO-based signature generation and verification for both RelayRequest and RelayResponse flows, and observed a surprising result -- real-world performance under load is worse with CGO enabled, despite isolated benchmarks showing the opposite.
This thread is to analyze, share insights, and ideally improve how we get the most out of the raw throughput that CGO theoretically provides.
Tried using both libraries for unified CGO-backed signing/verification.
Compilation failed with:
/tmp/go-build/_cgo_export.c:27: multiple definition of `secp256k1GoPanicIllegal';
/tmp/go-link-3228168235/000006.o:/tmp/go-build/_cgo_export.c:27: first defined here
Using libsecp256k1_sdk everywhere is not possible either -- missing go-dleq-required methods.
💬 Discussion Points
Are there known patterns or tuning strategies for reducing CGO boundary overhead in high-throughput crypto workloads?
Could using unsafe pointer reuse or thread pinning (runtime.LockOSThread) help mitigate context switching?
Would a CGo worker pool (batching boundary calls) or native pre-allocation strategy improve throughput?
Anyone successfully using both libsecp256k1_sdk and ethereum_secp256k1 in the same Go binary without symbol collisions?
🎯 Goal
Identify root cause (boundary, GC, thread, etc.)
Achieve advertised CGO performance in real world scenarios
👉 Call for input:
If you’ve run into similar CGO performance inversions or have ideas to profile the boundary cost more precisely (e.g., using perf, pprof, or C-side instrumentation), please share below.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
We’ve been experimenting with CGO-based signature generation and verification for both
RelayRequestandRelayResponseflows, and observed a surprising result -- real-world performance under load is worse with CGO enabled, despite isolated benchmarks showing the opposite.This thread is to analyze, share insights, and ideally improve how we get the most out of the raw throughput that CGO theoretically provides.
📊 Observed Results
RelayRequest Signature Verification
ethereum_secp256k1Load test results:
ethereum_secp256k1build tag: (Grafana snapshot 👇)RelayResponse Signature Generation
libsecp256k1_sdkLoad test results:
libsecp256k1_sdkbuild tag: (Grafana snapshot 👇)🧠 Hypotheses (Why CGO Degrades Under Load)
Despite being faster in micro-benchmarks (both on host and container), the real-world load tests degrade significantly when CGO is on.
Possible explanations:
big.Intconversions and interface indirection.⚙️ Integration Challenges
Mixed usage (
libsecp256k1_sdk+ethereum_secp256k1)libsecp256k1_sdkeverywhere is not possible either -- missinggo-dleq-required methods.💬 Discussion Points
unsafepointer reuse or thread pinning (runtime.LockOSThread) help mitigate context switching?libsecp256k1_sdkandethereum_secp256k1in the same Go binary without symbol collisions?🎯 Goal
👉 Call for input:
If you’ve run into similar CGO performance inversions or have ideas to profile the boundary cost more precisely (e.g., using
perf,pprof, or C-side instrumentation), please share below.Beta Was this translation helpful? Give feedback.
All reactions