optimize TLV encoding-decoding #123

pulsejet · 2025-02-03T03:24:21Z

The idea is similar to #84 with a better approach

Using a reader interface causes a malloc every time the reader is created or delegated. The WireView class stays on the stack, and behaves similar to the readers (also it's very well tested, I hope). The downside is that the wire itself is an alloc when using a buffer directly. It may be possible to work around this with unsafe but doesn't seem worth it.
For optional fields, we make a new allocation for every field, e.g. every *uint64. This moves the optional to stack memory with a generic enc.Optional[V]

Added benchmarks for TLV encoding and decoding to this package. Current results (this will likely improve further after bringing in more stuff from #84)

Before (current main branch):

cpu: AMD EPYC 7702P 64-Core Processor               
BenchmarkDataEncodeSmall-128                     1000000              1267 ns/op             648 B/op         14 allocs/op
BenchmarkDataEncodeMedium-128                    1000000              1368 ns/op             744 B/op         14 allocs/op
BenchmarkDataEncodeMediumLongName-128            1000000              1755 ns/op            1032 B/op         14 allocs/op
BenchmarkDataEncodeLarge-128                     1000000              1367 ns/op             744 B/op         14 allocs/op
BenchmarkDataDecodeSmall-128                     1000000              2241 ns/op            1288 B/op         21 allocs/op
BenchmarkDataDecodeMedium-128                    1000000              2502 ns/op            1800 B/op         22 allocs/op
BenchmarkDataDecodeMediumLongName-128            1000000              3624 ns/op            2952 B/op         23 allocs/op
BenchmarkDataDecodeLarge-128                     1000000              2529 ns/op            1800 B/op         22 allocs/op
PASS

After this PR:

cpu: AMD EPYC 7702P 64-Core Processor               
BenchmarkDataEncodeSmall-128                     1000000              1128 ns/op             648 B/op          9 allocs/op
BenchmarkDataEncodeMedium-128                    1000000              1196 ns/op             744 B/op          9 allocs/op
BenchmarkDataEncodeMediumLongName-128            1000000              1521 ns/op            1032 B/op          9 allocs/op
BenchmarkDataEncodeLarge-128                     1000000              1172 ns/op             744 B/op          9 allocs/op
BenchmarkDataDecodeSmall-128                     1000000              1565 ns/op             936 B/op         11 allocs/op
BenchmarkDataDecodeMedium-128                    1000000              1878 ns/op            1448 B/op         12 allocs/op
BenchmarkDataDecodeMediumLongName-128            1000000              3018 ns/op            2600 B/op         13 allocs/op
BenchmarkDataDecodeLarge-128                     1000000              1929 ns/op            1448 B/op         12 allocs/op
PASS

EDIT 1: More memory optimization for name hash

Before -
BenchmarkNameHash-128                    1000000               579.3 ns/op           448 B/op          1 allocs/op
BenchmarkNameHashPrefix-128              1000000              1578 ns/op             672 B/op         21 allocs/op

After -
BenchmarkNameHash-128                    1000000               358.5 ns/op             0 B/op          0 allocs/op
BenchmarkNameHashPrefix-128              1000000               972.2 ns/op           176 B/op          1 allocs/op

EDIT 2: Since the forwarder does not need to decode all fields in the packet, e.g. signature and content, it can skip some parts of the decoding. Benchmark with this new FwPacket which only decodes what is needed:

Before is the first set of benchmarks in this description^^

BenchmarkDataDecodeSmall-128                     1000000               646.3 ns/op           376 B/op          4 allocs/op
BenchmarkDataDecodeMedium-128                    1000000              1208 ns/op             888 B/op          5 allocs/op
BenchmarkDataDecodeMediumLongName-128            1000000              2237 ns/op            2040 B/op          6 allocs/op
BenchmarkDataDecodeLarge-128                     1000000              1230 ns/op             888 B/op          5 allocs/op

This has a cost - any interests in the expiry period may get dropped?

pulsejet · 2025-02-04T20:17:47Z

CI benchmark graphs are live at https://named-data.github.io/ndnd/dev/bench/index.html

zjkmxy · 2025-02-04T21:06:33Z

Great progress. Thanks!

pulsejet added 18 commits February 3, 2025 02:58

test: refactor signer

6c71c73

std: add encoding benchmark

2d7524e

wip enc

ca36a1f

wip: fast read

0fe5c18

wip: fix encoding tests

5f1e2ca

tests pass

31a5cb1

time and nat

02e5782

reader: optimize

128b424

str is on stack

1a3920c

uint is optional

2c6c10d

spec: move to optional

4960da5

std: more move to optional

60b0bbd

std: interest optional

d9c49d9

std: simplify and test optional

e23fe47

std: rename reader

1c14ebc

fw: keep wire on face as wire

10b63dd

std: fix wire view read

accab8b

std: wire view refactor

54e5585

pulsejet added the std go-ndn issues label Feb 3, 2025

codegen: optimize encode wire plan

1a9e4db

pulsejet changed the title ~~[WIP] optimize TLV decoding reader~~ [WIP] optimize TLV encoding-decoding Feb 3, 2025

pulsejet added 9 commits February 3, 2025 05:21

std: optimize name hash

66d8011

fw: remove name cloning

74cecda

fw: remove interest name from pit records

a6cb9b1

fw: add fast packet

4028d78

fw: decode simpler packet

de9c6d5

std: switch trie to TlvStr()

2e581c0

fw: don't copy pit token

a818f6a

std: refactor hash pool to use bytes.Buffer

f841351

refactor: move ds to types package

57549fb

pulsejet added 15 commits February 3, 2025 18:45

std: refactor optional cast to package

2be2ce7

std: add arc pool type

2dc1e23

std: refactor for type-safe pool

34f62c0

std: fix tautology in stream read

7938a4b

types: add some unused lockfree structs

4d64508

fw: unify fw queue

0823034

fw: remove unused update

8623dfb

fw: make pit token table static

7e8bf6a

ndnlp: reduce log

ae52603

pit: reduce lookup table size to 1M

2c02d80

fw: simplify pit expiry

09323a9

This has a cost - any interests in the expiry period may get dropped?

fw: pit pools

288df65

fw: prevent overwrite inrecord pit token

3cbfbf8

fw: pit cs node pool

62d4ca8

fw: remove pit unsafe gc optimizations

4c6f003

pulsejet merged commit 8ac1029 into main Feb 4, 2025
10 checks passed

pulsejet deleted the enc/bench-3 branch February 4, 2025 20:13

pulsejet changed the title ~~[WIP] optimize TLV encoding-decoding~~ optimize TLV encoding-decoding Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize TLV encoding-decoding #123

optimize TLV encoding-decoding #123

pulsejet commented Feb 3, 2025 •

edited

Loading

pulsejet commented Feb 4, 2025

zjkmxy commented Feb 4, 2025

optimize TLV encoding-decoding #123

optimize TLV encoding-decoding #123

Conversation

pulsejet commented Feb 3, 2025 • edited Loading

pulsejet commented Feb 4, 2025

zjkmxy commented Feb 4, 2025

pulsejet commented Feb 3, 2025 •

edited

Loading