Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize TLV encoding-decoding #123

Merged
merged 43 commits into from
Feb 4, 2025
Merged

optimize TLV encoding-decoding #123

merged 43 commits into from
Feb 4, 2025

Conversation

pulsejet
Copy link
Collaborator

@pulsejet pulsejet commented Feb 3, 2025

The idea is similar to #84 with a better approach

  1. Using a reader interface causes a malloc every time the reader is created or delegated. The WireView class stays on the stack, and behaves similar to the readers (also it's very well tested, I hope). The downside is that the wire itself is an alloc when using a buffer directly. It may be possible to work around this with unsafe but doesn't seem worth it.
  2. For optional fields, we make a new allocation for every field, e.g. every *uint64. This moves the optional to stack memory with a generic enc.Optional[V]

Added benchmarks for TLV encoding and decoding to this package. Current results (this will likely improve further after bringing in more stuff from #84)

Before (current main branch):

cpu: AMD EPYC 7702P 64-Core Processor               
BenchmarkDataEncodeSmall-128                     1000000              1267 ns/op             648 B/op         14 allocs/op
BenchmarkDataEncodeMedium-128                    1000000              1368 ns/op             744 B/op         14 allocs/op
BenchmarkDataEncodeMediumLongName-128            1000000              1755 ns/op            1032 B/op         14 allocs/op
BenchmarkDataEncodeLarge-128                     1000000              1367 ns/op             744 B/op         14 allocs/op
BenchmarkDataDecodeSmall-128                     1000000              2241 ns/op            1288 B/op         21 allocs/op
BenchmarkDataDecodeMedium-128                    1000000              2502 ns/op            1800 B/op         22 allocs/op
BenchmarkDataDecodeMediumLongName-128            1000000              3624 ns/op            2952 B/op         23 allocs/op
BenchmarkDataDecodeLarge-128                     1000000              2529 ns/op            1800 B/op         22 allocs/op
PASS

After this PR:

cpu: AMD EPYC 7702P 64-Core Processor               
BenchmarkDataEncodeSmall-128                     1000000              1128 ns/op             648 B/op          9 allocs/op
BenchmarkDataEncodeMedium-128                    1000000              1196 ns/op             744 B/op          9 allocs/op
BenchmarkDataEncodeMediumLongName-128            1000000              1521 ns/op            1032 B/op          9 allocs/op
BenchmarkDataEncodeLarge-128                     1000000              1172 ns/op             744 B/op          9 allocs/op
BenchmarkDataDecodeSmall-128                     1000000              1565 ns/op             936 B/op         11 allocs/op
BenchmarkDataDecodeMedium-128                    1000000              1878 ns/op            1448 B/op         12 allocs/op
BenchmarkDataDecodeMediumLongName-128            1000000              3018 ns/op            2600 B/op         13 allocs/op
BenchmarkDataDecodeLarge-128                     1000000              1929 ns/op            1448 B/op         12 allocs/op
PASS

EDIT 1: More memory optimization for name hash

Before -
BenchmarkNameHash-128                    1000000               579.3 ns/op           448 B/op          1 allocs/op
BenchmarkNameHashPrefix-128              1000000              1578 ns/op             672 B/op         21 allocs/op

After -
BenchmarkNameHash-128                    1000000               358.5 ns/op             0 B/op          0 allocs/op
BenchmarkNameHashPrefix-128              1000000               972.2 ns/op           176 B/op          1 allocs/op

EDIT 2: Since the forwarder does not need to decode all fields in the packet, e.g. signature and content, it can skip some parts of the decoding. Benchmark with this new FwPacket which only decodes what is needed:

Before is the first set of benchmarks in this description^^

BenchmarkDataDecodeSmall-128                     1000000               646.3 ns/op           376 B/op          4 allocs/op
BenchmarkDataDecodeMedium-128                    1000000              1208 ns/op             888 B/op          5 allocs/op
BenchmarkDataDecodeMediumLongName-128            1000000              2237 ns/op            2040 B/op          6 allocs/op
BenchmarkDataDecodeLarge-128                     1000000              1230 ns/op             888 B/op          5 allocs/op

@pulsejet pulsejet added the std go-ndn issues label Feb 3, 2025
@pulsejet pulsejet changed the title [WIP] optimize TLV decoding reader [WIP] optimize TLV encoding-decoding Feb 3, 2025
@pulsejet pulsejet merged commit 8ac1029 into main Feb 4, 2025
10 checks passed
@pulsejet pulsejet deleted the enc/bench-3 branch February 4, 2025 20:13
@pulsejet pulsejet changed the title [WIP] optimize TLV encoding-decoding optimize TLV encoding-decoding Feb 4, 2025
@pulsejet
Copy link
Collaborator Author

pulsejet commented Feb 4, 2025

CI benchmark graphs are live at https://named-data.github.io/ndnd/dev/bench/index.html

@zjkmxy
Copy link
Member

zjkmxy commented Feb 4, 2025

Great progress. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
std go-ndn issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants