🧩 Investigating CGO Signature Performance #1822

red-0ne · 2025-10-08T13:05:11Z

red-0ne
Oct 8, 2025

Context

We’ve been experimenting with CGO-based signature generation and verification for both RelayRequest and RelayResponse flows, and observed a surprising result -- real-world performance under load is worse with CGO enabled, despite isolated benchmarks showing the opposite.

This thread is to analyze, share insights, and ideally improve how we get the most out of the raw throughput that CGO theoretically provides.

📊 Observed Results

RelayRequest Signature Verification

Implementation: ethereum_secp256k1
Performance (containerized environment):

Variant Avg (ms) p99 (ms)

CGO (enabled) ~100 ~500

NO-CGO ~15 ~200

Load test results:

With ethereum_secp256k1 build tag: (Grafana snapshot 👇)

Without build tag: (Grafana snapshot 👇)

RelayResponse Signature Generation

Implementation: libsecp256k1_sdk
Performance:

Variant Avg (ms) p99 (ms)

CGO (enabled) ~12 ~150

NO-CGO ~0.7 ~10

Load test results:

With libsecp256k1_sdk build tag: (Grafana snapshot 👇)

Without build tag: (Grafana snapshot 👇)

🧠 Hypotheses (Why CGO Degrades Under Load)

Despite being faster in micro-benchmarks (both on host and container), the real-world load tests degrade significantly when CGO is on.

Possible explanations:

CGO boundary cost amplification under concurrency and goroutine churn.
Memory allocation + GC overhead from big.Int conversions and interface indirection.
Thread contention between Go’s scheduler and native C threads under load.

⚙️ Integration Challenges

Mixed usage (`libsecp256k1_sdk` + `ethereum_secp256k1`)

Tried using both libraries for unified CGO-backed signing/verification.

Compilation failed with:

/tmp/go-build/_cgo_export.c:27: multiple definition of `secp256k1GoPanicIllegal';
/tmp/go-link-3228168235/000006.o:/tmp/go-build/_cgo_export.c:27: first defined here

Using libsecp256k1_sdk everywhere is not possible either -- missing go-dleq-required methods.

💬 Discussion Points

Are there known patterns or tuning strategies for reducing CGO boundary overhead in high-throughput crypto workloads?
Could using unsafe pointer reuse or thread pinning (runtime.LockOSThread) help mitigate context switching?
Would a CGo worker pool (batching boundary calls) or native pre-allocation strategy improve throughput?
Anyone successfully using both libsecp256k1_sdk and ethereum_secp256k1 in the same Go binary without symbol collisions?

🎯 Goal

Identify root cause (boundary, GC, thread, etc.)
Achieve advertised CGO performance in real world scenarios

👉 Call for input:
If you’ve run into similar CGO performance inversions or have ideas to profile the boundary cost more precisely (e.g., using perf, pprof, or C-side instrumentation), please share below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🧩 Investigating CGO Signature Performance #1822

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

🧩 Investigating CGO Signature Performance #1822

Uh oh!

Uh oh!

red-0ne Oct 8, 2025

Context

📊 Observed Results

RelayRequest Signature Verification

RelayResponse Signature Generation

🧠 Hypotheses (Why CGO Degrades Under Load)

⚙️ Integration Challenges

Mixed usage (libsecp256k1_sdk + ethereum_secp256k1)

💬 Discussion Points

🎯 Goal

Replies: 0 comments

red-0ne
Oct 8, 2025

Mixed usage (`libsecp256k1_sdk` + `ethereum_secp256k1`)