Add Nettrace compression and multi-process support #1258

mjsabby · 2020-09-05T23:33:22Z

Adds C# implementation for LZ-based compression and decompression that is used in BPerf File Format (the file format we're intending to replace)
Adds a flag for the compression type
Adds next 4 bytes to header, this is the decompressed size.

mjsabby · 2020-09-05T23:34:09Z

brianrob · 2020-09-08T18:48:05Z

@mjsabby, are there corresponding runtime changes for this?

mjsabby · 2020-09-08T18:53:28Z

@brianrob The runtime does not yet emit this. @noahfalk wanted the file format to be settled on first. We will have our tool generate it, and then hopefully I can port it to the runtime as an option as well.

noahfalk

Hey @mjsabby this mostly looked good, thanks!
I put some comments inline and I think @brianrob sent a meeting invite for all of us to chat about these PRs soon

src/TraceEvent/EventPipe/EventCache.cs

src/TraceEvent/Compression/ULZCompression.cs

noahfalk · 2020-09-08T21:57:16Z

src/TraceEvent/Compression/ULZCompression.cs

+ public static unsafe ArraySegment<byte> Decompress(ArraySegment<byte> input, int decompressedSize)
+ {
+ byte[] output = new byte[decompressedSize * 2];
+ fixed (byte* inputPtr = &input.Array[input.Offset])


Can we avoid pinning these and work with indexes or Spans rather than raw pointers? I know this code is hardly the only offender but one of things I am hoping to do with the EventPipeEventSource is convert so it doesn't use any unsafe pointer manipulations.

The compression code does check out of bounds and is likely to be a hot path. I've removed all the compression code, and only kept decompression code so it is easier to audit if that helps. Let me know.

noahfalk · 2020-09-08T22:02:53Z

src/TraceEvent/EventPipe/EventPipeEventSource.cs

@@ -887,7 +887,8 @@ public unsafe void FromStream(Deserializer deserializer)
 internal enum EventBlockFlags : short
 {
 Uncompressed = 0,
- HeaderCompression = 1
+ HeaderCompression = 1,
+ EventBlockULZCompression = 2


This PR should also update the spec and add tests

src/TraceEvent/Compression/ULZCompression.cs

src/TraceEvent/EventPipe/EventCache.cs

mjsabby · 2020-09-10T07:26:34Z

@noahfalk If you could do a once over to see if this is the direction you wanted ...

noahfalk · 2020-09-12T01:57:20Z

src/TraceEvent/EventPipe/EventCache.cs

+ bool isULZCompressed = (flags & (ushort)EventBlockFlags.EventBlockULZCompression) != 0;
+
+ int eventBlockSize = eventBlockData.Length;
+ if (isULZCompressed && headerSize >= 24)


If the isULZCompressed flag and headerSize don't match I would error similar to the checks above (Assert + return). We should probably have a better error handling scheme, but it at least marks where the errors are detected in the code and prevents continued parsing.

At the moment this if block would not run but also the if(!isULZCompressed) block below would not run, presumably leaving the parser in a broken state.

noahfalk · 2020-09-12T02:30:36Z

src/TraceEvent/EventPipe/EventPipeEventSource.cs

@@ -1388,7 +1409,8 @@ enum CompressedHeaderFlags
 ActivityId = 1 << 4,
 RelatedActivityId = 1 << 5,
 Sorted = 1 << 6,
- DataLength = 1 << 7
+ DataLength = 1 << 7,
+ ProcessId = 1 << 8,


The flags field is a single byte, no room to set the 9th bit : ) I'd suggest changing bit 2 into CaptureThreadPidAndSequence and encoding the process id as the VarInt64(current_event_proc_id - previous_event_proc_id). This means:

Bit 2 is clear (probably most events) -> proc id is unchanged from last event, no additional data encoded in the header
Bit 2 is set, encoded process id field is single byte 0 -> process id is unchanged from last event, 1 additional byte used in header. This case happens every time two adjacent events are logged from different threads in the same process.
Bit 2 is set, encoded process id field is non-zero -> process_id = prev_event_process_id + ReadVarInt64(encoded_proc_id_field). This occurs whenever adjacent events have different PID. Encoding size is variable depending on magnitude of proc id, probably 2 bytes.

We may also want an optimization that single-proc traces never encode a process id regardless if bit 2 is set. This ensures the runtime produced traces don't regress in size.

noahfalk · 2020-09-12T02:36:20Z

src/TraceEvent/EventPipe/EventPipeEventSource.cs

+ public static void ReadFromFormat(int version, byte* headerPtr, bool useHeaderCompresion, ref EventPipeEventHeader header)
+ {
+ switch (version)
+ {


We should only need to add one new major version? The current shipped version of the format is 4 and the new one would be 5.

I'm a bit wary of mixing v4 and v5 functionality and having a single implementation for both. I realize this might make for a little code duplication. Presumably any feature work we do in the runtime during .NET 6.0 that would also necessitate a version increase will get rolled into v5 as well. This could mean that we need to bring back the v4 version of the code later anyway if the delta between v4 and v5 becomes large enough.

noahfalk · 2020-09-12T02:56:41Z

src/TraceEvent/EventPipe/ULZCompression.cs

+
+ if (run == 7)
+ {
+ run += (int)DecodeMod(ref ip);


With bad data I assume its possible that ip == ipEnd, this would read outside the buffer.

noahfalk · 2020-09-12T02:58:08Z

src/TraceEvent/EventPipe/ULZCompression.cs

+
+ if (len == 15 + MinMatch)
+ {
+ len += (int)DecodeMod(ref ip);


Another buffer overrun possible here? (ip == ipEnd)

noahfalk · 2020-09-12T02:59:36Z

src/TraceEvent/EventPipe/ULZCompression.cs

+ return -1;
+ }
+
+ int dist = ((token & 16) << 12) + Unsafe.ReadUnaligned<ushort>(ip);


Another buffer overrun possible here? (ip == ipEnd)

noahfalk · 2020-09-12T03:18:46Z

@noahfalk If you could do a once over to see if this is the direction you wanted ...

Direction seemed fine to me, thanks @mjsabby! I spotted a few things in the details I commented on above

noahfalk · 2020-09-12T03:21:01Z

cc @josalem @sywhang

Add Nettrace compression support

14723a9

mjsabby requested a review from davmason September 7, 2020 01:05

noahfalk reviewed Sep 9, 2020

View reviewed changes

Add multi process support as well

cce19e7

mjsabby changed the title ~~Add Nettrace compression support~~ Add Nettrace compression and multi-process support Sep 10, 2020

noahfalk reviewed Sep 12, 2020

View reviewed changes

Base automatically changed from master to main February 2, 2021 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Nettrace compression and multi-process support #1258

Add Nettrace compression and multi-process support #1258

mjsabby commented Sep 5, 2020

mjsabby commented Sep 5, 2020

brianrob commented Sep 8, 2020

mjsabby commented Sep 8, 2020

noahfalk left a comment

noahfalk Sep 8, 2020

mjsabby Sep 10, 2020

noahfalk Sep 8, 2020

mjsabby commented Sep 10, 2020

noahfalk Sep 12, 2020

noahfalk Sep 12, 2020

noahfalk Sep 12, 2020

josalem Sep 14, 2020 •

edited

noahfalk Sep 12, 2020

noahfalk Sep 12, 2020

noahfalk Sep 12, 2020

noahfalk commented Sep 12, 2020

noahfalk commented Sep 12, 2020

Add Nettrace compression and multi-process support #1258

Are you sure you want to change the base?

Add Nettrace compression and multi-process support #1258

Conversation

mjsabby commented Sep 5, 2020

mjsabby commented Sep 5, 2020

brianrob commented Sep 8, 2020

mjsabby commented Sep 8, 2020

noahfalk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjsabby commented Sep 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josalem Sep 14, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noahfalk commented Sep 12, 2020

noahfalk commented Sep 12, 2020

josalem Sep 14, 2020 •

edited