Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nettrace compression and multi-process support #1258

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mjsabby
Copy link
Contributor

@mjsabby mjsabby commented Sep 5, 2020

  • Adds C# implementation for LZ-based compression and decompression that is used in BPerf File Format (the file format we're intending to replace)
  • Adds a flag for the compression type
  • Adds next 4 bytes to header, this is the decompressed size.

@mjsabby
Copy link
Contributor Author

mjsabby commented Sep 5, 2020

cc @noahfalk

@brianrob
Copy link
Member

brianrob commented Sep 8, 2020

@mjsabby, are there corresponding runtime changes for this?

@mjsabby
Copy link
Contributor Author

mjsabby commented Sep 8, 2020

@brianrob The runtime does not yet emit this. @noahfalk wanted the file format to be settled on first. We will have our tool generate it, and then hopefully I can port it to the runtime as an option as well.

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mjsabby this mostly looked good, thanks!
I put some comments inline and I think @brianrob sent a meeting invite for all of us to chat about these PRs soon

src/TraceEvent/EventPipe/EventCache.cs Show resolved Hide resolved
src/TraceEvent/EventPipe/EventCache.cs Outdated Show resolved Hide resolved
src/TraceEvent/Compression/ULZCompression.cs Outdated Show resolved Hide resolved
public static unsafe ArraySegment<byte> Decompress(ArraySegment<byte> input, int decompressedSize)
{
byte[] output = new byte[decompressedSize * 2];
fixed (byte* inputPtr = &input.Array[input.Offset])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid pinning these and work with indexes or Spans rather than raw pointers? I know this code is hardly the only offender but one of things I am hoping to do with the EventPipeEventSource is convert so it doesn't use any unsafe pointer manipulations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compression code does check out of bounds and is likely to be a hot path. I've removed all the compression code, and only kept decompression code so it is easier to audit if that helps. Let me know.

@@ -887,7 +887,8 @@ public unsafe void FromStream(Deserializer deserializer)
internal enum EventBlockFlags : short
{
Uncompressed = 0,
HeaderCompression = 1
HeaderCompression = 1,
EventBlockULZCompression = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should also update the spec and add tests

src/TraceEvent/Compression/ULZCompression.cs Outdated Show resolved Hide resolved
src/TraceEvent/EventPipe/EventCache.cs Show resolved Hide resolved
@mjsabby mjsabby changed the title Add Nettrace compression support Add Nettrace compression and multi-process support Sep 10, 2020
@mjsabby
Copy link
Contributor Author

mjsabby commented Sep 10, 2020

@noahfalk If you could do a once over to see if this is the direction you wanted ...

bool isULZCompressed = (flags & (ushort)EventBlockFlags.EventBlockULZCompression) != 0;

int eventBlockSize = eventBlockData.Length;
if (isULZCompressed && headerSize >= 24)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the isULZCompressed flag and headerSize don't match I would error similar to the checks above (Assert + return). We should probably have a better error handling scheme, but it at least marks where the errors are detected in the code and prevents continued parsing.

At the moment this if block would not run but also the if(!isULZCompressed) block below would not run, presumably leaving the parser in a broken state.

@@ -1388,7 +1409,8 @@ enum CompressedHeaderFlags
ActivityId = 1 << 4,
RelatedActivityId = 1 << 5,
Sorted = 1 << 6,
DataLength = 1 << 7
DataLength = 1 << 7,
ProcessId = 1 << 8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flags field is a single byte, no room to set the 9th bit : ) I'd suggest changing bit 2 into CaptureThreadPidAndSequence and encoding the process id as the VarInt64(current_event_proc_id - previous_event_proc_id). This means:

Bit 2 is clear (probably most events) -> proc id is unchanged from last event, no additional data encoded in the header
Bit 2 is set, encoded process id field is single byte 0 -> process id is unchanged from last event, 1 additional byte used in header. This case happens every time two adjacent events are logged from different threads in the same process.
Bit 2 is set, encoded process id field is non-zero -> process_id = prev_event_process_id + ReadVarInt64(encoded_proc_id_field). This occurs whenever adjacent events have different PID. Encoding size is variable depending on magnitude of proc id, probably 2 bytes.

We may also want an optimization that single-proc traces never encode a process id regardless if bit 2 is set. This ensures the runtime produced traces don't regress in size.

public static void ReadFromFormat(int version, byte* headerPtr, bool useHeaderCompresion, ref EventPipeEventHeader header)
{
switch (version)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only need to add one new major version? The current shipped version of the format is 4 and the new one would be 5.

Copy link
Contributor

@josalem josalem Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit wary of mixing v4 and v5 functionality and having a single implementation for both. I realize this might make for a little code duplication. Presumably any feature work we do in the runtime during .NET 6.0 that would also necessitate a version increase will get rolled into v5 as well. This could mean that we need to bring back the v4 version of the code later anyway if the delta between v4 and v5 becomes large enough.


if (run == 7)
{
run += (int)DecodeMod(ref ip);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With bad data I assume its possible that ip == ipEnd, this would read outside the buffer.


if (len == 15 + MinMatch)
{
len += (int)DecodeMod(ref ip);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another buffer overrun possible here? (ip == ipEnd)

return -1;
}

int dist = ((token & 16) << 12) + Unsafe.ReadUnaligned<ushort>(ip);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another buffer overrun possible here? (ip == ipEnd)

@noahfalk
Copy link
Member

@noahfalk If you could do a once over to see if this is the direction you wanted ...

Direction seemed fine to me, thanks @mjsabby! I spotted a few things in the details I commented on above

@noahfalk
Copy link
Member

cc @josalem @sywhang

Base automatically changed from master to main February 2, 2021 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants