Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can support striped directory to increase metadata performance of very large directories? #25

Open
hmings888 opened this issue Nov 19, 2024 · 1 comment
Labels
enhancement New feature or request gathering-requirements Issues open for collecting feedback, ideas, and requirements for potential future implementations.

Comments

@hmings888
Copy link

In current implementation of beegfs metadata server, all the files under one directory are stored on the same metadata server that their parent directory is .
Image a very large directory containing tens of thousands or even more files, which would be visited frequently, then the load of the metadata server storing this directory would be very very high, evenly response timeout. Could the files under one directory be hashed to multiple shards and then stored across multiple metadata servers? I think this would avoid the performance issue. As I known, Lustre supports this feature.

@hmings888 hmings888 added enhancement New feature or request new Issues that haven't been triaged yet labels Nov 19, 2024
@iamjoemccormick iamjoemccormick added gathering-requirements Issues open for collecting feedback, ideas, and requirements for potential future implementations. and removed new Issues that haven't been triaged yet labels Dec 20, 2024
@iamjoemccormick
Copy link
Member

@hmings888,

I agree this would be useful for very large directories. While we currently do not have plans to implement such functionality, I'd like to keep this issue open to gauge community interest and better understand use cases where this might be beneficial.

It’s important to note that sharding or striping directories across multiple metadata servers would introduce its own performance trade-offs. For instance, the current architecture avoids distributed locks for many operations. As an example, moving or renaming a file within the same directory today involves only a single metadata server. With a sharded directory, such operations might require coordination across multiple metadata servers, adding complexity and potential overhead.

This isn’t to say that such a feature wouldn’t improve performance in some scenarios. However, I’m curious how often this feature is actively used in Lustre, compared to simply advising users to avoid creating very large directories. This is a best practice for most file systems I’m aware of, including Lustre.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gathering-requirements Issues open for collecting feedback, ideas, and requirements for potential future implementations.
Projects
None yet
Development

No branches or pull requests

2 participants