Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delimiter support for S3 List API #2996

Merged
merged 2 commits into from
Feb 19, 2025

Conversation

Arun-LinkedIn
Copy link
Contributor

This adds support for treating "/" in blob names as delimiter. It allows to treat prefixes as "subdirectories" and group them under "CommonPrefixes" response in the LIST API. This also enables us to list and delete directories using AWS S3 CLIs

More details can be found in https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html under "CommonPrefixes" section.

For example, if we have below files (with names delimited by / to represent directories)

Ambry account: named-blob-sandbox
Ambry container: container-a
1. myfile.txt
2.folder/file1.txt
3.folder/file2.txt

1. cURL response for List API is 

curl -X GET 'http://localhost:1174/s3/named-blob-sandbox/container-a?prefix=folder&delimiter=/' | xmllint --format -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   409  100   409    0     0   3976      0 --:--:-- --:--:-- --:--:--  3970
<?xml version="1.0"?>
<ListBucketResult>
  <Name>container-a</Name>
  <Prefix>folder</Prefix>
  <MaxKeys>1000</MaxKeys>
  <KeyCount>2</KeyCount>
  <Delimiter>/</Delimiter>
  <Contents>
    <Key>folder/file1.txt</Key>
    <LastModified>2025-01-27T09:37:23Z</LastModified>
    <Size>15</Size>
  </Contents>
  <Contents>
    <Key>folder/file2.txt</Key>
    <LastModified>2025-01-27T09:37:23Z</LastModified>
    <Size>15</Size>
  </Contents>
  <IsTruncated>false</IsTruncated>
</ListBucketResult>

2. AWS S3 CLI responses for list and remove directories are:

>aws s3 ls s3://container-a/ --recursive
2025-01-26 12:39:31         15 folder/file1.txt
2025-01-26 12:39:31         15 folder/file2.txt
2025-01-26 12:37:51         15 myfile.txt

>aws s3 ls s3://container-a/
                               PRE folder/
2025-01-26 12:37:51    15      myfile.txt

>aws s3 rm s3://container-a/folder/ --recursive
delete: s3://container-a/folder/file1.txt
delete: s3://container-a/folder/file2.txt

@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2025

Codecov Report

Attention: Patch coverage is 87.12871% with 13 lines in your changes missing coverage. Please review.

Project coverage is 69.95%. Comparing base (52ba813) to head (8b56cf3).
Report is 176 commits behind head on master.

Files with missing lines Patch % Lines
...om/github/ambry/frontend/NamedBlobListHandler.java 87.03% 2 Missing and 5 partials ⚠️
...b/ambry/tools/perf/NamedBlobMysqlDatabasePerf.java 0.00% 3 Missing ⚠️
...va/com/github/ambry/frontend/s3/S3ListHandler.java 84.61% 0 Missing and 2 partials ⚠️
...com/github/ambry/frontend/s3/S3MessagePayload.java 93.75% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2996      +/-   ##
============================================
+ Coverage     64.24%   69.95%   +5.70%     
- Complexity    10398    12153    +1755     
============================================
  Files           840      889      +49     
  Lines         71755    75098    +3343     
  Branches       8611     8988     +377     
============================================
+ Hits          46099    52532    +6433     
+ Misses        23004    19828    -3176     
- Partials       2652     2738      +86     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -706,10 +709,27 @@ private Page<NamedBlobRecord> run_list_v2(String accountName, String containerNa
int resultIndex = 0;
while (resultSet.next()) {
String blobName = resultSet.getString(1);
if (resultIndex++ == maxKeysValue) {
if (resultIndex == maxKeysValue) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only set nextContinuationToken to the blobName when resultIndex == maxKeysValue. The idea is to only do that when it's the last key, but now that we change the logic when incrementing resultIndex. resultIndex would not equal to maxKeysValue in the last key, so we probably have to use an other index, or other way to set next continuation token here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Reverted back to previous logic of incrementing resultIndex on every iteration and setting nextContinuationToken to the blobName when resultIndex == maxKeysValue

// Extract the portion after the prefix and before the next '/'
String remainingPath = blobName.substring(blobNamePrefix == null ? 0 : blobNamePrefix.length());
remainingPath = remainingPath.startsWith("/") ? remainingPath.substring(1) : remainingPath;
int delimiterIndex = remainingPath.indexOf("/");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use a constant for "/".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (groupDirectories) {
// Extract the portion after the prefix and before the next '/'
String remainingPath = blobName.substring(blobNamePrefix == null ? 0 : blobNamePrefix.length());
remainingPath = remainingPath.startsWith("/") ? remainingPath.substring(1) : remainingPath;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not right, if the prefix is "abc" and this blobname is "abc/efg/hij", and the delimiter is "/", then the we should "abc/" as common prefix, not "abc/efg/".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Corrected it

if (groupDirectories) {
// Add the directories to the result
entries.addAll(directories.stream()
.map(directory -> new NamedBlobRecord(accountName, containerName, directory, null, Utils.Infinite_Time, 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right, when we are returning a common prefix, it should include the prefix as well. if we have "abc/def/ghi", and the prefix is "abc/", and the common prefix should be "abc/def/", not jsut "def/".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it..

remainingPath = remainingPath.startsWith("/") ? remainingPath.substring(1) : remainingPath;
int delimiterIndex = remainingPath.indexOf("/");
if (delimiterIndex != -1) {
boolean validEntry = directories.add(remainingPath.substring(0, delimiterIndex) + "/");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it could be remainingPath.substring(0, delimiterIndex +DELIMITER_STRING.length());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@justinlin-linkedin justinlin-linkedin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. we can probably add some comments to the mergePageResult to better show how the method is working.

@Arun-LinkedIn Arun-LinkedIn merged commit cd51c5e into linkedin:master Feb 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants