-
Notifications
You must be signed in to change notification settings - Fork 21.7k
core/vm: optimize push2 opcode #31267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Back of the envelope calculation, this will save us between 0.2 and 0.5 ms per block, which is quite significant for such a small change |
core/vm/instructions.go
Outdated
| integer = new(uint256.Int) | ||
| ) | ||
| if *pc+2 < codeLen { | ||
| scope.Stack.push(integer.SetUint64(uint64(binary.BigEndian.Uint16(scope.Contract.Code[*pc+1 : *pc+3])))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty certain there is a way for uint256 to parse the bytes from scope.Contract.Code which does not go via binary.BigEndian. (If there isn't, I feel like there should be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, well there's func (z *Int) SetBytes2(in []byte) *Int , but performance-wise it's about the same as what you do here. It would maybe look a little bit cleaner:
scope.Stack.push(integer.SetBytes2(scope.contract.Code[*pc+1:]))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied, yep thats a cleaner way
holiman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: jwasinger <[email protected]>
|
Did another benchmark, a full block of PUSH2 POP instructions will go from |
s1na
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
It's pretty unfortunate that we have to resort to workarounds like this because the Go compiler is too restrictive to allow disabling the inliner for certain functions.. |
fjl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clean up the tests
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
16.38s 20.35s (flat, cum) 10.31% of Total
740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
. . 977: var (
40ms 40ms 978: codeLen = len(scope.Contract.Code)
970ms 970ms 979: start = min(codeLen, int(*pc+1))
200ms 200ms 980: end = min(codeLen, start+pushByteSize)
. . 981: )
670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
. . 983:
. . 984: // Missing bytes: pushByteSize - len(pushData)
410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 {
. . 986: a.Lsh(a, uint(8*missing))
. . 987: }
12.69s 14.94s 988: scope.Stack.push2(*a)
10ms 10ms 989: *pc += size
650ms 650ms 990: return nil, nil
. . 991: }
. . 992:}
```
Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.
PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.
# Benchmarks
```
BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op
BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op
```
---------
Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
16.38s 20.35s (flat, cum) 10.31% of Total
740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
. . 977: var (
40ms 40ms 978: codeLen = len(scope.Contract.Code)
970ms 970ms 979: start = min(codeLen, int(*pc+1))
200ms 200ms 980: end = min(codeLen, start+pushByteSize)
. . 981: )
670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
. . 983:
. . 984: // Missing bytes: pushByteSize - len(pushData)
410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 {
. . 986: a.Lsh(a, uint(8*missing))
. . 987: }
12.69s 14.94s 988: scope.Stack.push2(*a)
10ms 10ms 989: *pc += size
650ms 650ms 990: return nil, nil
. . 991: }
. . 992:}
```
Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.
PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.
# Benchmarks
```
BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op
BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op
```
---------
Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
16.38s 20.35s (flat, cum) 10.31% of Total
740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
. . 977: var (
40ms 40ms 978: codeLen = len(scope.Contract.Code)
970ms 970ms 979: start = min(codeLen, int(*pc+1))
200ms 200ms 980: end = min(codeLen, start+pushByteSize)
. . 981: )
670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
. . 983:
. . 984: // Missing bytes: pushByteSize - len(pushData)
410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 {
. . 986: a.Lsh(a, uint(8*missing))
. . 987: }
12.69s 14.94s 988: scope.Stack.push2(*a)
10ms 10ms 989: *pc += size
650ms 650ms 990: return nil, nil
. . 991: }
. . 992:}
```
Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.
PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.
# Benchmarks
```
BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op
BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op
```
---------
Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
16.38s 20.35s (flat, cum) 10.31% of Total
740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
. . 977: var (
40ms 40ms 978: codeLen = len(scope.Contract.Code)
970ms 970ms 979: start = min(codeLen, int(*pc+1))
200ms 200ms 980: end = min(codeLen, start+pushByteSize)
. . 981: )
670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
. . 983:
. . 984: // Missing bytes: pushByteSize - len(pushData)
410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 {
. . 986: a.Lsh(a, uint(8*missing))
. . 987: }
12.69s 14.94s 988: scope.Stack.push2(*a)
10ms 10ms 989: *pc += size
650ms 650ms 990: return nil, nil
. . 991: }
. . 992:}
```
Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.
PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.
# Benchmarks
```
BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op
BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op
```
---------
Co-authored-by: jwasinger <[email protected]>
During my benchmarks on Holesky, around 10% of all CPU time was spent in
PUSH2
```
ROUTINE ======================== github.com/ethereum/go-ethereum/core/vm.newFrontierInstructionSet.makePush.func1 in github.com/ethereum/go-ethereum/core/vm/instructions.go
16.38s 20.35s (flat, cum) 10.31% of Total
740ms 740ms 976: return func(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
. . 977: var (
40ms 40ms 978: codeLen = len(scope.Contract.Code)
970ms 970ms 979: start = min(codeLen, int(*pc+1))
200ms 200ms 980: end = min(codeLen, start+pushByteSize)
. . 981: )
670ms 2.39s 982: a := new(uint256.Int).SetBytes(scope.Contract.Code[start:end])
. . 983:
. . 984: // Missing bytes: pushByteSize - len(pushData)
410ms 410ms 985: if missing := pushByteSize - (end - start); missing > 0 {
. . 986: a.Lsh(a, uint(8*missing))
. . 987: }
12.69s 14.94s 988: scope.Stack.push2(*a)
10ms 10ms 989: *pc += size
650ms 650ms 990: return nil, nil
. . 991: }
. . 992:}
```
Which is quite crazy. We have a handwritten encoder for PUSH1 already,
this PR adds one for PUSH2.
PUSH2 is the second most used opcode as shown here:
https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since
it is used by solidity quite significantly. Its used ~20 times as much
as PUSH20 and PUSH32.
# Benchmarks
```
BenchmarkPush/makePush-14 94196547 12.27 ns/op 0 B/op 0 allocs/op
BenchmarkPush/push-14 429976924 2.829 ns/op 0 B/op 0 allocs/op
```
---------
Co-authored-by: jwasinger <[email protected]>
- Optimize `opPush2` Refrence: [Geth #31267](ethereum/go-ethereum#31267) - make types consistent in `makeDup` ``` goos: linux goarch: amd64 pkg: github.com/erigontech/erigon/execution/vm cpu: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz │ new_bench.txt │ old_bench.txt │ │ sec/op │ sec/op vs base │ Push/makePush-8 13.23n ± 6% 18.72n ± 12% +41.53% (p=0.000 n=10) Push/push-8 8.531n ± 4% 11.890n ± 44% +39.37% (p=0.000 n=10) geomean 10.62n 14.92n +40.45% │ new_bench.txt │ old_bench.txt │ │ B/op │ B/op vs base │ Push/makePush-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Push/push-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ new_bench.txt │ old_bench.txt │ │ allocs/op │ allocs/op vs base │ Push/makePush-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Push/push-8 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ``` Benchmark Test ``` func BenchmarkPush(b *testing.B) { var ( code = common.FromHex("0011223344556677889900aabbccddeeff0102030405060708090a0b0c0d0e0ff1e1d1c1b1a19181716151413121") push2 = makePush(2, 2) callContext = &CallContext{Contract: Contract{ Code: code, }} pc = new(uint64) ) b.Run("makePush", func(b *testing.B) { for i := 0; i < b.N; i++ { push2(*pc, nil, callContext) callContext.Stack.pop() } }) b.Run("push", func(b *testing.B) { for i := 0; i < b.N; i++ { opPush2(*pc, nil, callContext) callContext.Stack.pop() } }) } ```
During my benchmarks on Holesky, around 10% of all CPU time was spent in PUSH2
Which is quite crazy. We have a handwritten encoder for PUSH1 already, this PR adds one for PUSH2.
PUSH2 is the second most used opcode as shown here: https://gist.github.com/shemnon/fb9b292a103abb02d98d64df6fbd35c8 since it is used by solidity quite significantly. Its used ~20 times as much as PUSH20 and PUSH32.
Benchmarks
This PR also adds a fuzzer to sanity check the handwritten opcode