Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map: Add Drain for traversing maps while removing entries #1349

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

smagnani96
Copy link
Contributor

@smagnani96 smagnani96 commented Feb 19, 2024

Dear community, with this PR I would like to introduce the Map.Drain API to traverse a map while also removing its entries.
The current MapIterator.Next traverse a map without removing the entries. For some use cases, it might be useful to have this API so that maps such as Hash can be freed during the lookup.
In addition, the MapIterator.Next would normally fail with maps of type Queue or Stack, for two main reasons:

  1. these maps do not have kernel primitives to traverse their content w/o removing, only support for LookupAndDelete (Lookup for Queue/Stack acts as a peek method and return the head);
  2. these maps are keyless, hence need separate support.

The proposed solution reuses the MapIterator structure and as much code as possible: the Map.Drain API return an iterator with a special flag set, specifying that the retrieved entries must be deleted as well.
To support different kernel versions in which the support for LookupAndDelete system call is not equal among the different maps (e.g., queues and hash support for this has been introduced in different commit),
this implementation introduces a fallback method that would leverage the sequential Lookup->Delete operations in case LookupAndDelete for that specific map is not supported.
These compatibility support has been dropped in this PR. Queues are expected to work since kernel v4.2, and the other Hash-like maps from v5.14.

Please any feedback is really welcome, don't hesitate ping me 😃

@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch from d65df8e to 9a2089a Compare February 19, 2024 18:06
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for putting in the work. Though I would like to request some changes.

examples/queue/main.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map_test.go Outdated Show resolved Hide resolved
@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch 2 times, most recently from 11c813b to 2dfaef8 Compare February 29, 2024 13:52
@smagnani96
Copy link
Contributor Author

Thank you very much for putting in the work. Though I would like to request some changes.

Many thanks again Dylan for the feedback!

@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch from 2dfaef8 to 23c82af Compare March 13, 2024 19:42
@smagnani96 smagnani96 changed the title MapIterator support for Queues/Stacks + further testing + example MapIterator support for Queues/Stacks + further testing Mar 13, 2024
@smagnani96 smagnani96 marked this pull request as ready for review March 13, 2024 19:45
@smagnani96 smagnani96 requested a review from a team as a code owner March 13, 2024 19:45
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patches! I'm not completely convinced we want to make MapIterator an interface (yet).

map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
types.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question regarding semantics:

  • What happens when you do a plain Lookup on a queue or stack? Does that function like a peek? Or does it return an error?
  • Are there other map types which support LookupAndDelete? Are there ones which support passing a Key argument?

map.go Outdated Show resolved Hide resolved
@smagnani96
Copy link
Contributor Author

Thanks to all the reviewers for the feedback.
I guess the only thing left to decide is whether to proceed with one of the two proposed solutions:

  1. Creating a simple method pop that would be called instead of next, as for the solution proposed in the previous commit.
  2. Creating an interface to support multiple MapIterators, as for the current commit under review.

@lmb
Copy link
Collaborator

lmb commented May 14, 2024

Have you had the time to look into my questions in #1349 (review) ? I think they will help us decide what the implementation looks like.

@smagnani96
Copy link
Contributor Author

Have you had the time to look into my questions in #1349 (review) ? I think they will help us decide what the implementation looks like.

@lmb I apologize, I missed the message.

  1. According to https://docs.kernel.org/bpf/map_queue_stack.html and some quick tests of the code, Queues/Stack maps support the following operations.
    • lookup (peek): returns the value at the head without removing it. If no value exists, ErrKeyNotExist is returned.
    • update (push)
    • delete (pop)
      In addition, batch operations are not supported. Also, NextKey is not supported: it either returns can't marshal key: <type> doesn't marshal to 0 bytes or next key: invalid argument when called with nil (these maps have a key of size 0).
  2. On the other hand, other maps like Array and Hash support LookupAndDelete, and the method requires a non-null key as a parameter.

@dylandreimerink
Copy link
Member

Queues/Stack maps support the following operations.

Clarification, for stacks/queues:
BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop
BPF_MAP_UPDATE_ELEM -> push

On the other hand, other maps like Array and Hash support LookupAndDelete, and the method requires a non-null key as a parameter.

Only hash maps(and variants: LRU, PER_CPU, LRU_PER_CPU) have support for BPF_MAP_LOOKUP_AND_DELETE_ELEM since v5.14. Array maps don't seem to support it since you can't delete from an array map.

Also interesting is that BPF_MAP_LOOKUP_AND_DELETE_BATCH which is the batch version is only supported by hash maps and not by stacks or queues and was added before the non-batch version was supported in v5.6.

Are there ones which support passing a Key argument?

Yes, the hashmap supports passing a key value. Actually, even before v5.14, a valid pointer must be provided to the key field. The kernel always checks it and copies it into kernel memory, just to discard it. After v5.14 it of course starts to actually get passed to the map implementation.

@lmb
Copy link
Collaborator

lmb commented May 17, 2024

Okay! That was very helpful, thanks to both of you. The peek operations only return the top of the stack / first item in the queue? And currently doing .Iterate() on a queue or stack doesn't work at all since we pass a key?

I'm currently leaning towards adding a new method Map.Drain() *MapIterator. Initially this could only support queues / stacks but later on we may be able to extend it to hash maps. The benefit of a new method is that it makes it clearer that we're actually deleting items here. We can put the implementation into the same MapIterator struct for now as Timo suggested. I think that we should take MapIterator.Next(nil, &value) when draining (and not ignore the key argument as currently done.)

@dylandreimerink
Copy link
Member

The peek operations only return the top of the stack / first item in the queue? And currently doing .Iterate() on a queue or stack doesn't work at all since we pass a key?

Yes, peeking only returns the top of the stack. Any key provided will be ignored as far as I understand. But as @S41m0n mentioned the BPF_MAP_GET_NEXT_KEY op would return an error if using the normal iterator.

@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch 3 times, most recently from c3c539c to 3acb798 Compare June 25, 2024 14:06
@smagnani96 smagnani96 changed the title MapIterator support for Queues/Stacks + further testing MapIterator support for Draining map content (e.g., queues,stacks,hash) Jun 25, 2024
@smagnani96
Copy link
Contributor Author

Thanks for the info-gathering and brainstorming. I just pushed a possible implementation of what has been discussed.
Let me know your opinions :)

@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch from 3acb798 to c620552 Compare June 26, 2024 16:24
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map.go Outdated Show resolved Hide resolved
map_test.go Show resolved Hide resolved
map_test.go Outdated Show resolved Hide resolved
map_test.go Outdated Show resolved Hide resolved
map_test.go Show resolved Hide resolved
@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch 2 times, most recently from 39b3749 to 3ac6523 Compare July 30, 2024 18:07
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the end result, but between the commits a lot happens that is canceled out later. So please squash all commits into one (or cleanup the git history some other way)

map.go Outdated Show resolved Hide resolved
@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch from 3ac6523 to 3d9f4e0 Compare September 3, 2024 18:22
@smagnani96
Copy link
Contributor Author

smagnani96 commented Sep 3, 2024

This PR has been rebased to #1559 so that the current PR preserve only the needed material for the Map.Drain API. The only commit that would remain and which we should refer to is 3d9f4e0.

@smagnani96 smagnani96 changed the title MapIterator support for Draining map content (e.g., queues,stacks,hash) introducing Map.Drain API for traversing maps while removing entries Sep 3, 2024
@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch 2 times, most recently from 0916768 to 318ab1b Compare September 5, 2024 17:32
@smagnani96
Copy link
Contributor Author

smagnani96 commented Sep 5, 2024

I removed from the code the support for the fallback sequential Lookup -> Delete.
The Drain API against BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_PERCPU_HASH, BPF_MAP_TYPE_LRU_HASH and BPF_MAP_TYPE_LRU_PERCPU_HASH is not expected to work with kernel versions < 5.14.

@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch 2 times, most recently from 2ae048d to d6678a8 Compare September 5, 2024 17:40
@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch from d6678a8 to 47fdd29 Compare December 6, 2024 11:53
This commit introduces the `Map.Drain` API to traverse the map while also
removing its entries. It leverages the same `MapIterator` structure, with
the introduction of a new unexported method to handle the map draining.
The tests make sure that the behavior is as expected, and that this API
returns an error while invoked on the wrong map, such as arrays, for which
`Map.Iterate` should be used instead.
The `LookupAndDelete` system call support has been introduced in:

1. 5.14 for BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_PERCPU_HASH,
	BPF_MAP_TYPE_LRU_HASH and BPF_MAP_TYPE_LRU_PERCPU_HASH.
2. 4.20 for BPF_MAP_TYPE_QUEUE, BPF_MAP_TYPE_STACK

Do not expect the `Map.Drain` API to work on prior versions, according
to the target map type. From the user perspective, the usage should
be similar to `Map.Iterate`, as shown as follows:

```go
m, err := NewMap(&MapSpec{
	Type:       Hash,
	KeySize:    4,
	ValueSize:  8,
	MaxEntries: 10,
})
// populate here the map and defer close
it := m.Drain()
for it.Next(keyPtr, &value) {
    // here the entry doesn't exist anymore in the underlying map.
    ...
}
```

Signed-off-by: Simone Magnani <[email protected]>
@smagnani96 smagnani96 force-pushed the feature/queue_stack_support branch from 47fdd29 to 46f5343 Compare December 6, 2024 11:57
@smagnani96
Copy link
Contributor Author

  • Moved functions so that GH code diffs are more readable.
  • Added comments to newly helper functions.
  • Removed TestDrainEmptyMap: added this check into TestMapDrain before filling the map.

Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this iteration 😉.

Comment on lines +1657 to +1659
if ptr, ok := keyOut.(unsafe.Pointer); ok {
copy(unsafe.Slice((*byte)(ptr), len(buf)), buf)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this is a bit off topic. @ti-mo Do you recall why we have this exception here? This is just moving the existing code added in 4609dc7. But we seem to take so much care to ensure safety elsewhere, including in sysenc.Unmarshal. But not here.

If for some reason the user passes in an unsafe pointer in keyOut that is smaller than buf this causes us to write into unknown parts of the heap.

Copy link
Collaborator

@ti-mo ti-mo Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure, but the commit you linked does contain some hints around slices of types with trailing padding:

Allowing such a type creates an edge case: make([]padding, 1)
uses zero-allocation marshaling while make([]padding, 2)
doesn't, due to interior padding. It's simpler to skip such
types for now.

Passing in unsafe.Pointer sidesteps this limitation. Also see f0d238d. unsafe.Pointer is supported for most map operations, see marshalMapSyscallInput.

@ti-mo ti-mo changed the title introducing Map.Drain API for traversing maps while removing entries map: Add Drain for traversing maps while removing entries Dec 17, 2024
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some changes to be made here, and I think some more testing needs to be done to understand the behaviour in the face of concurrent deletes.

It's also important to understand that LookupAndDelete may keep failing after a successful NextKey on very busy maps, and we may need to protect the application from getting into runaway CPU usage. I vaguely recall some existing code in the library that gives up after x attempts, but I can't remember where exactly.

Please reach out offline if you'd like some help with this.

We'll ship this in a 0.17.1 after the xmas break.

// Next decodes the next key and value.
//
// Iterating a hash map from which keys are being deleted is not
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this comment removed? This applies in general, and it's documentation on exported API, so we shouldn't just remove it.

// safe. You may see the same key multiple times. Iteration may
// also abort with an error, see IsIterationAborted.
//
// Iterating a queue/stack map returns an error (NextKey invalid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below:

Suggested change
// Iterating a queue/stack map returns an error (NextKey invalid
// Iterating a queue/stack map returns an error, use [Map.Drain] instead.

Nit: try to reduce your use of the term 'API', it's redundant and often implied. Less is more!

// but their respective outputs will differ.
//
// Draining a map that does not support entry removal such as
// an array return an error (LookupAndDelete not supported):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact error returned is an implementation detail and may change over time unless it's a sentinel. If it's in godoc, it becomes a contract.

If it does return an error wrapping ErrNotSupported, this is valid docmentation:

Draining a map that doesn't support deletions will return [ErrNotSupported].

@@ -1602,6 +1623,12 @@ func marshalMap(m *Map, length int) ([]byte, error) {
return buf, nil
}

// isKeyValueMap returns true if map supports key-value pairs (ex. hash)
// and false in case of value-only maps (ex. queue).
func isKeyValueMap(m *Map) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make more sense inverted, e.g. func (m *Map) isKeyless() bool since you're negating its only use below. Then the docstring can also become a bit more focused: isKeyless returns true if the map is value-only, like a [Queue].

return mi.nextIterate(keyOut, valueOut)
}

func (mi *MapIterator) nextDrain(keyOut, valueOut interface{}) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (mi *MapIterator) nextDrain(keyOut, valueOut interface{}) bool {
func (mi *MapIterator) pop(keyOut, valueOut interface{}) bool {

This function pops a single entry, it doesn't drain the whole map.

return false
}

func (mi *MapIterator) nextIterate(keyOut, valueOut interface{}) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (mi *MapIterator) nextIterate(keyOut, valueOut interface{}) bool {
func (mi *MapIterator) next(keyOut, valueOut interface{}) bool {


iter := m.Drain()
allocs := testing.AllocsPerRun(int(m.MaxEntries()-1), func() {
if !iter.Next(keyPtr, &value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !iter.Next(keyPtr, &value) {
qt.Assert(t, qt.IsTrue(iter.Next(keyPtr, &value)))

Or maybe qt.Assert allocates? Feel free to ignore.

} else {
err = m.Put(uint32(i), uint32(i))
}
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil {
qt.Assert(t, qt.IsNil(err))

// Handle value-only maps (ex. queue).
if !isKeyValueMap(mi.target) {
if keyOut != nil {
mi.err = fmt.Errorf("non-nil keyOut provided for map without a key, must be nil instead")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mi.err = fmt.Errorf("non-nil keyOut provided for map without a key, must be nil instead")
mi.err = fmt.Errorf("non-nil keyOut provided for map without a key")

mi.cursor = make([]byte, mi.target.keySize)
}

// Always retrieve first key in the map. This should ensure that the whole map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the case. Consider the following scenario:

  • drainer A: NextKey(0) yields a
  • drainer B: NextKey(0) yields a
  • drainer B: LookupAndDelete(a) succeeds
  • drainer A: LookupAndDelete(a) fails

The L&D failing can happen at any point due to a concurrent delete, also when e.g. a bpf program deletes a key, which is fairly common. In the current implementation, I think iteration halts at that point, while there may still be keys left in the map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants