Skip to content

Customizable parser configuration #1881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 35 commits into
base: dev
Choose a base branch
from

Conversation

Dimi1010
Copy link
Collaborator

@Dimi1010 Dimi1010 commented Jul 17, 2025

This PR adds a new class ParserConfiguration with the intent to enable user customization of parsing logic.
The initial experimental implementation includes PortMapper that can be used to define custom port rule mappings.

The default parser configuration

A default configuration singleton is provided by the runtime via ParserConfiguration::getDefault() containing all default parsing logic. This configuration is to be used in all instances when no custom configuration is provided.

Modifying the default configuration

The reference returned by ParserConfiguration::getDefault() allows modification of the underlying configuration. This can be used to modify the default parsing behaviour or even override it completely by assigning a custom configuration.

A modified default configuration can be reset to its initial state by calling ParserConfiguration::resetDefault.

NB: The configuration object is currently not thread safe for modification. Do not modify the config if an active parsing is being performed.

Parser extension

  • Added new overload Layer::parseNextLayer(ParserConfiguration const& config): The new overload is to be considered the main signature of the parsing system. The parsed configuration is used to determine parser behaviour.
  • Changed Layer::parseNextLayer() to forward to the new overload: The parameter-less version of parseNextLayer has been converted to a non-virtual function that calls the new overload with the default parsing configuration.

Limitations

  • Currently the Packet class only utilizes the default parser config during parsing.
  • In this PR only the TcpLayer actually uses the PortMapper, and only for mapping to HttpRequestLayer and HttpResponseLayer.

@Dimi1010
Copy link
Collaborator Author

Dimi1010 commented Jul 17, 2025

@seladb what do you think of this implementation for dynamic port mapping as a potential solution to #1876 and #671?

Copy link

codecov bot commented Jul 17, 2025

Codecov Report

Attention: Patch coverage is 93.04636% with 21 lines in your changes missing coverage. Please review.

Project coverage is 82.79%. Comparing base (54b3c2f) to head (6fe00cd).

Files with missing lines Patch % Lines
Packet++/src/ParserConfig.cpp 91.45% 10 Missing ⚠️
Packet++/header/ParserConfig.h 87.50% 5 Missing ⚠️
Packet++/src/UdpLayer.cpp 95.65% 3 Missing ⚠️
Tests/Packet++Test/Tests/SomeIpTests.cpp 85.71% 1 Missing and 1 partial ⚠️
Packet++/header/PacketTrailerLayer.h 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1881      +/-   ##
==========================================
+ Coverage   82.77%   82.79%   +0.01%     
==========================================
  Files         289      291       +2     
  Lines       51098    51265     +167     
  Branches    11068    11322     +254     
==========================================
+ Hits        42298    42444     +146     
- Misses       7643     8007     +364     
+ Partials     1157      814     -343     
Flag Coverage Δ
alpine320 74.49% <86.30%> (+<0.01%) ⬆️
fedora42 74.61% <86.75%> (-0.01%) ⬇️
macos-13 81.02% <87.77%> (-0.01%) ⬇️
macos-14 81.02% <87.77%> (-0.01%) ⬇️
macos-15 81.02% <87.77%> (-0.01%) ⬇️
mingw32 69.94% <84.87%> (+0.03%) ⬆️
mingw64 69.93% <84.87%> (+0.03%) ⬆️
npcap 84.74% <91.97%> (+0.01%) ⬆️
rhel94 74.37% <86.30%> (+0.01%) ⬆️
ubuntu2004 58.72% <84.33%> (+0.03%) ⬆️
ubuntu2004-zstd 58.86% <84.33%> (+0.08%) ⬆️
ubuntu2204 74.30% <86.30%> (+0.01%) ⬆️
ubuntu2204-icpx 60.74% <65.74%> (-0.06%) ⬇️
ubuntu2404 74.52% <86.30%> (+<0.01%) ⬆️
ubuntu2404-arm64 74.51% <86.30%> (+<0.01%) ⬆️
unittest 82.79% <93.04%> (+0.01%) ⬆️
windows-2022 84.71% <91.97%> (+0.01%) ⬆️
windows-2025 84.77% <91.97%> (+0.01%) ⬆️
winpcap 84.89% <91.97%> (+0.02%) ⬆️
xdp 51.13% <85.38%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@seladb
Copy link
Owner

seladb commented Jul 18, 2025

@ Dimi1010 overall, it's a good solution that could work! Here are a few comments:

  • We need to figure out how the configuration is loaded. I think it should be loaded once when the app starts and possibly kept in a static variable. The packet parser can access it to get the configuration
  • parseNextLayer(ParserConfiguration const& config) could be made protected to make sure this overload is never called by the user, ensuring that the configuration is always static and loaded once
  • It'd be nice to tie the default configuration of a protocol to the layer representing it, like we do today with isHttpPort() etc. I'm not exactly sure how to do it though, maybe we can think about it further
  • It'd be nice to be able to load and save the configuration from/to json
  • Something to think about: should the loaded configuration be on top of the default or replace it completely? For example, the user might define additional ports for HTTP or may want to replace the existing ports altogether. Another example: the user wants to replace HTTP ports, but keep the default SSL/TLS ports
  • I'm not sure if we need a PortPair or just a port that can be either src or dst. Usually we look at a single port and not a combination of them. Even if we want to keep PortPair we should probably give an option to define only one port and translate it behind the scenes to a PortPair with similar src and dst ports

@Dimi1010
Copy link
Collaborator Author

Dimi1010 commented Jul 18, 2025

  • We need to figure out how the configuration is loaded. I think it should be loaded once when the app starts and possibly kept in a static variable. The packet parser can access it to get the configuration

That is the default behaviour already. ParserConfiguration::getDefault() returns a reference to a static variable.
It is only constructed on first access and then reused on following calls.

static inline ParserConfiguration& getDefault()
{
	static ParserConfiguration defaultConfig = makeDefaultConfiguration();
	return defaultConfig;
}
  • parseNextLayer(ParserConfiguration const& config) could be made protected to make sure this overload is never called by the user, ensuring that the configuration is always static and loaded once

I don't agree with that. IMO, both should be available to the users. In the general use case, the user will configure / modify the default parser and use the parameter-less overload as is (parseNextLayer()), but some might want to have several configurations for parsing. Keeping the overload public allows that behaviour while it doesn't sacrifice the general use case performance.

  • It'd be nice to tie the default configuration of a protocol to the layer representing it, like we do today with isHttpPort() etc. I'm not exactly sure how to do it though, maybe we can think about it further

I suppose that might be possible but the default configuration mappings are created in ParserConfig.cpp. That would imply that TU including all the layers?
I think it would be better to have all the mappings in a single location so it is easier to spot conflicting mappings?
I am actually unsure if we have any port mappings that map to different next layers depending on the parsing layer.

  • It'd be nice to be able to load and save the configuration from/to json

Can be done, but it wasn't a priority for the initial draft.

  • Something to think about: should the loaded configuration be on top of the default or replace it completely? For example, the user might define additional ports for HTTP or may want to replace the existing ports altogether. Another example: the user wants to replace HTTP ports, but keep the default SSL/TLS ports

The way I imagine the system working is thus:

  1. the connection has portSrc and portDst
  2. The mapper is queried with (portSrc, portDst) pair:
  3. The mapper returns a matrix of the possible hits:
    3.1. full match - there is an entry mapping the entire pair to a protocol type
    3.2. partial match (src) - there is an entry mapping portSrc and any portDst value to a protocol type
    3.3. partial match (dst) - there is an entry mapping portDst and any portSrc value to a protocol type

Example:
For example pair (src: 80, dst: 65444) would match:

  • (src: 80, dst: 65444) -> HttpResponse
  • (src: 80, dst: *) -> HttpResponse
  • (src: *, dst: 65444) -> UnknownProtocol

If wildcards are used in the input pair the exact match will be the same as the partial match of the non-wildcard port.
The match (src: *, dst: *) always results UnknownPort.

  • I'm not sure if we need a PortPair or just a port that can be either src or dst. Usually we look at a single port and not a combination of them. Even if we want to keep PortPair we should probably give an option to define only one port and translate it behind the scenes to a PortPair with similar src and dst ports

PortPair provides factory methods fromSrc and fromDst that create a pair that uses a wildcard for the opposite port, allowing easy creation from only one port.

Generally symmetrical bindings are handled on the mapping level, when the rule is being registered. In PortMapper::addPortMapping(PortPair port, ProtocolType protocol, bool symmetrical = false), the flag symmetrical registers (src: A, dst: B) and (src: B, dst: A) to map to the same protocol if set to true.

@seladb Hope that answers the questions!

@seladb
Copy link
Owner

seladb commented Jul 18, 2025

That is the default behaviour already. ParserConfiguration::getDefault() returns a reference to a static variable. It is only constructed on first access and then reused on following calls.

static inline ParserConfiguration& getDefault()
{
	static ParserConfiguration defaultConfig = makeDefaultConfiguration();
	return defaultConfig;
}

Yes, I was referring to the configuration loaded by the user, not only the default one

I don't agree with that. IMO, both should be available to the users. In the general use case, the user will configure / modify the default parser and use the parameter-less overload as is (parseNextLayer()), but some might want to have several configurations for parsing. Keeping the overload public allows that behaviour while it doesn't sacrifice the general use case performance.

parseNextLayer(), although it's a public method, shouldn't be called by users. Users should create a Packet instance and let PcapPlusPlus parse it. We need to remember that packet parsing is on the critical path, and we must ensure that any changes we make don't affect performance. That's the reason I don't want to encourage users to change the configuration at runtime. If we get feedback from users about it we can consider it in the future

The way I imagine the system working is thus:

1. the connection has `portSrc` and `portDst`

2. The mapper is queried with (`portSrc`, `portDst`) pair:

3. The mapper returns a matrix of the possible hits:
   3.1. full match - there is an entry mapping the entire pair to a protocol type
   3.2. partial match (src) - there is an entry mapping `portSrc` and any `portDst` value to a protocol type
   3.3. partial match (dst) - there is an entry mapping `portDst` and any `portSrc` value to a protocol type

Example: For example pair (src: 80, dst: 65444) would match:

* `(src: 80, dst: 65444)` -> `HttpResponse`

* `(src: 80, dst: *)` -> `HttpResponse`

* `(src: *, dst: 65444)` -> `UnknownProtocol`

If wildcards are used in the input pair the exact match will be the same as the partial match of the non-wildcard port. The match (src: *, dst: *) always results UnknownPort.

Will the user configuration be on top of the default one or replace it? For example: HTTP default are port 80 and 8080. The user might want to replace that with port 1234, or add an additional port 1234. How can we support both?
Another example: the user wants to replace ports 80 and 8080 with port 1234 for HTTP, but keep the default ports for SSL/TLS

PortPair provides factory methods fromSrc and fromDst that create a pair that uses a wildcard for the opposite port, allowing easy creation from only one port.

Generally symmetrical bindings are handled on the mapping level, when the rule is being registered. In PortMapper::addPortMapping(PortPair port, ProtocolType protocol, bool symmetrical = false), the flag symmetrical registers (src: A, dst: B) and (src: B, dst: A) to map to the same protocol if set to true.

I'd create a simpler interface for the users that can define a single port and have the port mapper create 2 rules behind the scenes

@Dimi1010
Copy link
Collaborator Author

Yes, I was referring to the configuration loaded by the user, not only the default one

The idea is that the user can either modify the default configuration, or make his own configuration instance.
The user can make changes to the default one by: getDefault().addPortMapping(...);

In the case of modifying the default, the user's changes essentially change the global default.
Reset of the default config is also possible via resetDefault() method that restores the "factory defaults".

If they want to have a separate instance, the can either make one from scratch, or make a copy of the default and store it at their convenience. A manually constructed config can even be assigned to the default via *::getDefault() = otherConfig;

Ideally the user would be able to inject a configuration to the parsing logic of Packet, but currently that isn't possible, so the only reason to keep separate instances of the configurations is for backup purposes.

parseNextLayer(), although it's a public method, shouldn't be called by users. Users should create a Packet instance and let PcapPlusPlus parse it. We need to remember that packet parsing is on the critical path, and we must ensure that any changes we make don't affect performance. That's the reason I don't want to encourage users to change the configuration at runtime. If we get feedback from users about it we can consider it in the future

Yes, it is mainly called internally. There was an idea to eventually allow Packets to be parsed with different configurations, specified at the packet parse time. So there is a potential use case for the Packet class itself to inject a parser configuration to the Layer classes. Haven't really figured out how to do that yet tho.

This will be an additive change, so in the base use case, the packet fetches the default parser config static.
In any case, the access to the configuration is via pointer indirection in both cases, so there isn't a comparative performance difference.

I haven't profiled the solution yet, but the most likely performance factor would be the PortMap implementation itself, not how the configuration dependency is actually injected into the parsing logic.

Will the user configuration be on top of the default one or replace it? For example: HTTP default are port 80 and 8080. The user might want to replace that with port 1234, or add an additional port 1234. How can we support both? Another example: the user wants to replace ports 80 and 8080 with port 1234 for HTTP, but keep the default ports for SSL/TLS

For port mapping, each configuration instance is essentially a global port table. The user can add entries, remove entries or completely replace the entire table.

For the example, if the user wants port 1234 to be HTTP, the user adds a rule (1234, *) -> HTTP. If the user wants to have 80 not be HTTP, he removes the rule from the map that is used for parsing.

I am unsure by what you mean by keeping the default ports for SSL/TLS. Each port rule is independent of each other.
Modifying the portPair -> HTTP rule would not affect the rule portPair -> SSL. (Assuming I have understood everything)

I'd create a simpler interface for the users that can define a single port and have the port mapper create 2 rules behind the scenes

Wdym? Like a convenience method that takes uint16_t and creates PortPair(x, Any) and PortPair(Any, x)? I suppose that is possible but it is a bit ambiguous as that can easily be inferred to be PortPair(x, x).

I think it is better to only allow explicit port pairs for clarity when users are defining their own rules. symmetrical already handles the use case of defining the reverse rule to their provided pair for them.

void PortMapper::addPortMapping(PortPair port, ProtocolType protocol, bool symmetrical)
{
	if (port == PortPair())
	{
		throw std::invalid_argument("PortPair cannot be empty (both src and dst ports are 0)");
	}

	auto insertResult = m_PortToProtocolMap.insert({ port, protocol });
	insertResult.first->second = protocol;  // Update the protocol if it already exists
	if (!insertResult.second)
	{
		PCPP_LOG_WARN("Port " << port << " is already mapped to protocol " << insertResult.first->second
		                      << ", updating to " << protocol);
	}

	if (symmetrical && port.portSrc != port.portDst)
	{
		// Add the symmetrical mapping
		PortPair symmetricalPort = { port.portDst, port.portSrc };
		addPortMapping(symmetricalPort, protocol, false);
	}
}

@Dimi1010
Copy link
Collaborator Author

@seladb Another case I found. FTP has both FTP default port (20) and FTP Data (port 21), but both map to ProtocolType::FTP.
Can we add a separate FTP Data protocol type? Having both being mapped to FTP would make it hard for the layers to disambiguate, as the mapper would return FTP in both cases. (or would require an additional hard coded check which defeats the purpose)

@seladb
Copy link
Owner

seladb commented Jul 19, 2025

The idea is that the user can either modify the default configuration, or make his own configuration instance. The user can make changes to the default one by: getDefault().addPortMapping(...);

In the case of modifying the default, the user's changes essentially change the global default. Reset of the default config is also possible via resetDefault() method that restores the "factory defaults".

If they want to have a separate instance, the can either make one from scratch, or make a copy of the default and store it at their convenience. A manually constructed config can even be assigned to the default via *::getDefault() = otherConfig;

Ideally the user would be able to inject a configuration to the parsing logic of Packet, but currently that isn't possible, so the only reason to keep separate instances of the configurations is for backup purposes.

I think that's a very reasonable solution, offering a lot of flexibility to users. They can start with the default and modify it as needed, or replace it with their configuration 👍

Yes, it is mainly called internally. There was an idea to eventually allow Packets to be parsed with different configurations, specified at the packet parse time. So there is a potential use case for the Packet class itself to inject a parser configuration to the Layer classes. Haven't really figured out how to do that yet tho.

This will be an additive change, so in the base use case, the packet fetches the default parser config static. In any case, the access to the configuration is via pointer indirection in both cases, so there isn't a comparative performance difference.

I haven't profiled the solution yet, but the most likely performance factor would be the PortMap implementation itself, not how the configuration dependency is actually injected into the parsing logic.

I think we agree on that 👍
The use case of a Packet instance to modify the parser configuration is interesting, but as you mentioned we can discuss it later as such use cases arise

Wdym? Like a convenience method that takes uint16_t and creates PortPair(x, Any) and PortPair(Any, x)? I suppose that is possible but it is a bit ambiguous as that can easily be inferred to be PortPair(x, x).

I think it is better to only allow explicit port pairs for clarity when users are defining their own rules. symmetrical already handles the use case of defining the reverse rule to their provided pair for them.

void PortMapper::addPortMapping(PortPair port, ProtocolType protocol, bool symmetrical)
{
	if (port == PortPair())
	{
		throw std::invalid_argument("PortPair cannot be empty (both src and dst ports are 0)");
	}

	auto insertResult = m_PortToProtocolMap.insert({ port, protocol });
	insertResult.first->second = protocol;  // Update the protocol if it already exists
	if (!insertResult.second)
	{
		PCPP_LOG_WARN("Port " << port << " is already mapped to protocol " << insertResult.first->second
		                      << ", updating to " << protocol);
	}

	if (symmetrical && port.portSrc != port.portDst)
	{
		// Add the symmetrical mapping
		PortPair symmetricalPort = { port.portDst, port.portSrc };
		addPortMapping(symmetricalPort, protocol, false);
	}
}

I'd provide a simple interface, something along the lines of:
I think this will be simpler for most users to understand: "use port 80 for HTTP; use port 443 for TLS/SSL"

void PortMapper::addPortMapping(uint16_t port, ProtocolType protocol)
{
    addPortMapping({ port, port }, protocol, true);
}

@Dimi1010
Copy link
Collaborator Author

I'd provide a simple interface, something along the lines of:
I think this will be simpler for most users to understand: "use port 80 for HTTP; use port 443 for TLS/SSL"

void PortMapper::addPortMapping(uint16_t port, ProtocolType protocol)
{
    addPortMapping({ port, port }, protocol, true);
}

Sure, we can do that.

Although I think the more general case is if the mapping was { port : AnyPort } and { AnyPort : port }, rather than { port, port }, no?

@seladb
Copy link
Owner

seladb commented Jul 20, 2025

I'd provide a simple interface, something along the lines of:
I think this will be simpler for most users to understand: "use port 80 for HTTP; use port 443 for TLS/SSL"

void PortMapper::addPortMapping(uint16_t port, ProtocolType protocol)
{
    addPortMapping({ port, port }, protocol, true);
}

Sure, we can do that.

Although I think the more general case is if the mapping was { port : AnyPort } and { AnyPort : port }, rather than { port, port }, no?

Ah yes, I think you're right 👍

Comment on lines +102 to +103
// Would result in Pair (0, 0) which is invalid for the mapper.
// mapper.addPortMapping(PortPair::fromDst(0), WakeOnLan, false); // Wake-on-LAN over UDP (broadcast)
Copy link
Collaborator Author

@Dimi1010 Dimi1010 Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seladb Due to the way the mapper works, we can't actually do mapping (AnyPort, 0), because 0 is used as a magic value for the AnyPort. That would result in (AnyPort, AnyPort) pair.

I can either try to make a workaround (somehow), or we can drop the mapping.
From what I can read usage of Port 0 is kinda rare?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egecetin write this parser so maybe he knows if it's a rate case?

If we want to support port 0 maybe we can make PortPair a pair of 2 uint32_t values, but only allow assigning it uint16_t values - and make AnyPort of value that is larger than uint16_t...

@seladb
Copy link
Owner

seladb commented Jul 21, 2025

@Dimi1010 I think I remember why I didn't implement this until now: I wasn't sure what the performance implications might be. Packet parsing is the fast path of PcapPlusPlus, and changing it from hard-coded values to using a map might have performance implications, especially in high throughput network traffic (for example: when using DPDK, or even when using libpcap with high traffic). Is it possible to measure it somehow? 🤔

@Dimi1010
Copy link
Collaborator Author

@Dimi1010 I think I remember why I didn't implement this until now: I wasn't sure what the performance implications might be. Packet parsing is the fast path of PcapPlusPlus, and changing it from hard-coded values to using a map might have performance implications, especially in high throughput network traffic (for example: when using DPDK, or even when using libpcap with high traffic). Is it possible to measure it somehow? 🤔

It should be able to be measured. A large Pcap file that is loaded and then timed as it is fed into Packet parsing should suffice since we only need the comparative benchmarks for the parsing module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants