-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use overlay for cached #104
base: main
Are you sure you want to change the base?
Conversation
7de191e
to
9589736
Compare
pkg/api/cached.go
Outdated
gadgetDir, | ||
"-n", | ||
"-o", | ||
fmt.Sprintf("lowerdir=%s,upperdir=%s,workdir=%s", c.StagingPath, upperdir, workdir), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider what value we want for the redirect_dir
option. When I was working with overlayfs for making the gadget CI bits that reset the filesystem faster, I had to set it to on
, but that's not necessarily true here. https://docs.kernel.org/filesystems/overlayfs.html#redirect-dir
See https://github.com/gadget-inc/gadget/pull/11070/files#diff-37490d715785dc29d4323daef03e84518b414f5fdca4d258c9f415c936080d4f if you are curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL.
We can set it, I'm not sure how often we'll change the cache but it'll be nice to have the option.
What do you mean by
CI bits that reset the filesystem faster
As in, overlay
was faster or something else?
@airhorns https://github.com/gadget-inc/gadget/pull/14578 CI is passing with the overlay running. I think this PR is ready for review. |
pkg/api/cached.go
Outdated
// Perform an overlay mount if the mountCache attribute is true | ||
// Mount the root `StagingPath` at `targetPath/gadget` making `targetPath/gadget` `0777` | ||
if mountCache, exists := volumeAttributes["mountCache"]; exists && mountCache == "true" { | ||
upperdir := path.Join(targetPath, "gadget_writeable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We gonna try to avoid this extra subfolder if we can after talking IRL
@@ -0,0 +1,31 @@ | |||
name: Test Integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this because overlay won't run on mac's so bumping the csi-tests out to be more integration sanity tests.
@@ -287,3 +355,13 @@ func getFolderSize(path string) (int64, error) { | |||
}) | |||
return totalSize, err | |||
} | |||
|
|||
func execCommand(cmdName string, args ...string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In github actions we're not running as root, in our deployment we run as root in a privileged container so this is how we can call mount
/umount
during our tests. If there's a more elegant way to do this let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know of one 🤷
if err := os.MkdirAll(targetPath, targetPermissions); err != nil { | ||
return nil, fmt.Errorf("failed to create target directory %s: %s", targetPath, err) | ||
workdir := path.Join(volumePath, WORK_DIR) | ||
err = os.MkdirAll(workdir, 0777) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not seem to set the dir to 0777
so I had to add the explicit chmod
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrifying -- maybe because its a mkdirall, the new perms only apply if it is creating new dirs, but it leaves existing ones alone?
@airhorns I made the changes, I still have some clean up to do but I think this is ready for a quick second pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, or definitely good enough to canary
if err := os.MkdirAll(targetPath, targetPermissions); err != nil { | ||
return nil, fmt.Errorf("failed to create target directory %s: %s", targetPath, err) | ||
workdir := path.Join(volumePath, WORK_DIR) | ||
err = os.MkdirAll(workdir, 0777) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrifying -- maybe because its a mkdirall, the new perms only apply if it is creating new dirs, but it leaves existing ones alone?
@@ -287,3 +355,13 @@ func getFolderSize(path string) (int64, error) { | |||
}) | |||
return totalSize, err | |||
} | |||
|
|||
func execCommand(cmdName string, args ...string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know of one 🤷
"overlay", | ||
"-n", | ||
"--options", | ||
fmt.Sprintf("redirect_dir=on,lowerdir=%s,upperdir=%s,workdir=%s", c.StagingPath, upperdir, workdir), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did redirect_dir end up being necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely, it seems to be useful if there are changes to the lower dir that files don't disappear which seems good in general.
Fix up mount permissions Make the cache dir not writeable by the sandbox (or non-DL user) Remove wonky integration test
Fixup overlay command and rename things to be less gadget centric
Use sudo Fixup tests Use sudo if env variable set Test tweaks
7cf5d2e
to
f366531
Compare
Use an overlay for dateilager's cached service. Instead of giving each sandbox pod its own hard linked copy of the
dl_cache
we will mount an overlayfs of thecached
'sdl_cache
In the pod that's mounting to overlay we can see that we can hardlink between the 2 directories and that the pod can write to the cache and corrupt it.
data:image/s3,"s3://crabby-images/21a55/21a55340cd6fea8040a03010fcbb6dfc83a96d80" alt="Screenshot 2025-02-05 at 4 27 01 PM"
however on the
cached
pod the contents are still fineand additionally if we mount a second pod, it too sees the clean copy.
data:image/s3,"s3://crabby-images/18bfe/18bfe5031b3974b7c6b6cde5a53587d5e895d0c5" alt="Screenshot 2025-02-05 at 4 29 40 PM"
The pods are configured to use their ephemeral space so any writes to
dl_cache
will be cleaned up when the pod is killed.The
cached
pod stores the "golden cache" at/var/lib/kubelet/dateilager_cache
For a simple example pod lets look at
A pod that is mounting a volume managed by the dateilager-csi will get a new ephemeral dir (like
emptyDir
) that is on the host/var/lib/kubelet/pods/88064d87-6870-4da5-8ba1-d9ac62b3f3ff/volumes/kubernetes.io~csi/gadget/mount
this is thetargetPath
that comes in on thecsi.NodePublishVolumeRequest
The directory structure that we create inside of the target dir is like this
gadget
is the target mount point for the overlay.gadget_writeable
is theupperdir
andwork
is theworkdir
The readonly
lowerdir
is/var/lib/kubelet/dateilager_cache
which containsdl_cache
.