Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] overlay: make sure directories omitted from layers have the right permissions #1653

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nalind
Copy link
Member

@nalind nalind commented Jun 28, 2023

When extracting layer contents, if we find ourselves needing to implicitly create a directory (due to its not being included in the layer diff), try to give it the permissions, ownership, attributes, and datestamp of the corresponding directory from the layer which would be stacked below it.

When a layer omits any of the directories which contain items which that layer adds or modifies, this should prevent the default values that we would use from overriding those which correspond to the same directory
in a lower layer, which could later be mistaken as an indication that one or more of those was intentionally changed, forcing the directory to be pulled up.

When extracting layer contents, if we find ourselves needing to
implicitly create a directory (due to its not being included in the
layer diff), try to give it the permissions, ownership, attributes, and
datestamp of the corresponding directory from the layer which would be
stacked below it.

When a layer omits any of the directories which contain items which that
layer adds or modifies, this should prevent the default values that we
would use from overriding those which correspond to the same directory
in a lower layer, which could later be mistaken as an indication that
one or more of those was intentionally changed, forcing the directory to
be pulled up.

Signed-off-by: Nalin Dahyabhai <[email protected]>
When applying a chunked diff, ApplyDiff() iterates through the entries
in the diff's table of contents, creating directories and other
zero-length items like empty files, symbolic links, and directories,
building a queue of hard links, and asking goroutines to locate files in
other layers and configured OSTree repositories which contain content
that matches what should eventually end up in non-zero-length files in
the layer.

If a file's entire contents couldn't be found locally, but it was split
into chunks in the chunked diff, ApplyDiff() then searches for the
chunks in local layers.

Lastly, the contents of files with non-zero lengths, which ApplyDiff()
had attempted to find locally, but which weren't found locally, are then
requested from the registry.

Track which directories in the layer are created implicitly by
ApplyDiff(), so that the overlay driver can reset their ownership,
permissions, extended attributes, and datestamps to match the
immediately lower layer, if there is one.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Jul 7, 2023

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this makes sense to me, but oh it’s a lot of code to make that possible. That’s an impressive amount of effort.

(I have only fairly briefly skimmed the tests. But their existence adds a lot of confidence anyway :) )

@@ -188,6 +188,7 @@ type DriverWithDifferOutput struct {
Metadata string
BigData map[string][]byte
TarSplit []byte
ImplicitDirs []string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some documentation of what this field contains would be nice.

if hdr == nil {
continue
}
path := filepath.Join(stagingDirectory, implicitDir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a vague feeling that the code to apply a *tar.Header to a filesystem object should already exist somewhere, and that duplicating it is not the ideal approach.

I have no idea where is the best place for that code (pkg/archive?) and what all options need to be provided to it, though…

As specific examples, looking at overlay.ApplyDiffWithDiffer, it seems that d.options.ignoreChownErrors or options.ForceMask are not implemented in this new copy. I guess ForceMask is irrelevant because it should have been applied do the parent layer correctly, already… can that be assumed even with additional layer stores?

Comment on lines +24 to +27
idmap = idtools.NewIDMappingsFromMaps(append([]idtools.IDMap{}, uidMaps...), append([]idtools.IDMap{}, gidMaps...))
}
}
return &backfiller{idmap: idmap, lowerDiffDirs: append([]string{}, lowerDiffDirs...)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Absolutely non-blocking: This might be a tiny bit more readable when using x/exp/slices.Clone; that will be in the standard library in future Go 1.21. OTOH until then the x/exp package is technically without any stability promise…)

// ownership/permissions/attributes of a directory from a lower layer so that
// we don't have to create it in an upper layer using default values that will
// be mistaken for a reason that the directory was pulled up to that layer.
func NewBackfiller(idmap *idtools.IDMappings, lowerDiffDirs []string) *backfiller {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a public c/storage API, or would something like c/storage/internal/unionbackfill work?

(I don’t have a strong opinion, I just want this to be an intentional decision.)

// ownership/permissions/attributes of a directory from a lower layer so that
// we don't have to create it in an upper layer using default values that will
// be mistaken for a reason that the directory was pulled up to that layer.
func NewBackfiller(idmap *idtools.IDMappings, lowerDiffDirs []string) *backfiller {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lowerDiffDirs is expected to be from ordered from children to parents (is it?), that would be useful to document at least at subpackage/API boundaries like this.

// that an entry we inserted had no content.
func (r *Reader) Read(b []byte) (int, error) {
if r.currentIsQueued {
return 0, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pkg.go.dev/archive/tar#Reader.Read documents 0, io.EOF on both special file types, and empty regular files.


// Read will either read from a real entry in the archive, or pretend very hard
// that an entry we inserted had no content.
func (r *Reader) Read(b []byte) (int, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What … happens … when the parent layer contains a regular file in place of an expected parent directory?

I suppose it doesn’t matter what we do here, because extracting the current layer should fail, shouldn’t it?

hdr, err := tr.Next()
defer func() {
closeErr := tw.Close()
io.Copy(wc, reader)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? If tw.Close succeeded, the stream is already terminated by the tar EOF blocks.

And even if that were not true, both the reader and the writer are in an indeterminate (or anyway hard-for-me-to-reason-about) state.

} else if closeErr != nil {
wc.CloseWithError(closeErr)
} else {
wc.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Close() is equivalent to CloseWithError(nil), so this could be:

if err == nil { err = closeErr }
wc.CloseWithError(err)

@@ -0,0 +1,159 @@
package tarbackfill
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a public c/storage API, or would something like c/storage/internal/tarbackfill work?

(I don’t have a strong opinion, I just want this to be an intentional decision.)

@rhatdan
Copy link
Member

rhatdan commented Aug 16, 2023

@nalind Waiting on you?

@cgwalters
Copy link
Contributor

which could later be mistaken as an indication that one or more of those was intentionally changed, forcing the directory to be pulled up.

Another way to solve this possibly: Add an xattr to the directory like user.containers.meta (with serialized extended state, json/bincode/whatever) that would include to start a boolean value like synthetic: true that denotes this state, and we know hence when re-serializing to ignore that directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants