Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Storage: new paths entrypoint with pagination #7200

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Feb 10, 2025

What do these changes do?

reminder: storage caches partially the files in S3 in the DB.

  1. files from the file-picker, service inputs/outputs/logs are cached in the DB
  2. files from a dynamic service state folder are not cached in the DB, only the base folder is. --> Any listing inside these folders implies direct calls to the S3 backend.

Summary

This PR adds a new entrypoint in storage that allows to list files/folders (both inside DB/S3) with pagination.

Webserver API

  • New API entrypoint: GET /storage/locations/{location_id}/paths with file_filter query parameter
  • when file_filter is null then this will list the projects that have files,
  • when file_filter is some specific project ID, then it will list the nodes with files inside that project,
  • when file_filter is PROJECTID/NODEID then it will list the files/folders in that node that have files in them,
  • this goes all the way till the very last file
  • this is compatible with the simcore.s3 and datcore storages
  • moved storage tests to 01

Storage Rest API

  • implements the above with driving tests: test_handler_paths.py

Pagination

  • done via Cursor-based pagination (as the total is not known in S3),
  • --> calls to GET /storage/locations/{location_id}/paths can have limit and cursor query parameters
  • --> initial call shall have a cursor set to null or not passed
  • --> if there are more files, the response body will contain a next_page (the next cursor), that shall be passed with the next call to GET /storage/locations/{location_id}/paths to get the next page
  • --> total field is only filled when the calls are run solely against the DB (so do not rely on it as it will be empty if the call is done against S3)

AWS-library

  • added S3 functions to list page of objects
  • added S3 functions to count objects
  • driving tests: test_s3_client.py

Requirements

  • unification of types-aioboto3 as there were missing functions

Next steps

  • @odeimaiz to implement the frontend
  • @sanderegg to remove the old entrypoints and continuing cleanup/fixing

Related issue/s

How to test

Dev-ops checklist

@sanderegg sanderegg added a:storage issue related to storage service a:webserver issue related to the webserver service labels Feb 10, 2025
@sanderegg sanderegg added this to the Singularity milestone Feb 10, 2025
@sanderegg sanderegg self-assigned this Feb 10, 2025
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 12 lines in your changes missing coverage. Please review.

Project coverage is 59.97%. Comparing base (731dd9a) to head (a74a198).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (731dd9a) and HEAD (a74a198). Click for more details.

HEAD has 27 uploads less than BASE
Flag BASE (731dd9a) HEAD (a74a198)
unittests 32 5
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #7200       +/-   ##
===========================================
- Coverage   87.17%   59.97%   -27.21%     
===========================================
  Files        1680      730      -950     
  Lines       65053    32778    -32275     
  Branches     1106       12     -1094     
===========================================
- Hits        56709    19658    -37051     
- Misses       8030    13118     +5088     
+ Partials      314        2      -312     
Flag Coverage Δ *Carryforward flag
integrationtests 52.00% <33.33%> (-13.44%) ⬇️ Carriedforward from 492eded
unittests 93.00% <ø> (+6.82%) ⬆️

*This pull request uses carry forward flags. Click here to find out more.

Components Coverage Δ
api 76.84% <ø> (ø)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library 97.09% <ø> (ø)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk ∅ <ø> (∅)
agent 96.46% <ø> (ø)
api_server ∅ <ø> (∅)
autoscaling ∅ <ø> (∅)
catalog 91.73% <ø> (ø)
clusters_keeper ∅ <ø> (∅)
dask_sidecar ∅ <ø> (∅)
datcore_adapter 98.06% <ø> (ø)
director ∅ <ø> (∅)
director_v2 43.43% <ø> (-47.83%) ⬇️
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 88.79% <ø> (-0.95%) ⬇️
efs_guardian ∅ <ø> (∅)
invitations ∅ <ø> (∅)
osparc_gateway_server ∅ <ø> (∅)
payments ∅ <ø> (∅)
resource_usage_tracker ∅ <ø> (∅)
storage ∅ <ø> (∅)
webclient ∅ <ø> (∅)
webserver 56.00% <33.33%> (-29.17%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 731dd9a...a74a198. Read the comment docs.

@sanderegg sanderegg force-pushed the storage/add-pagination branch 8 times, most recently from 5672008 to ccaff88 Compare February 17, 2025 07:37
@sanderegg sanderegg force-pushed the storage/add-pagination branch from 7ab8d54 to 02ebf8d Compare February 20, 2025 15:47
@sanderegg sanderegg modified the milestones: Singularity, The Awakening Feb 24, 2025
@sanderegg sanderegg force-pushed the storage/add-pagination branch 2 times, most recently from 9bde281 to 8411473 Compare February 25, 2025 11:30
@sanderegg sanderegg marked this pull request as ready for review February 25, 2025 13:48
@sanderegg sanderegg changed the title ✨Storage: new entrypoints with pagination ✨Storage: new paths entrypoint with pagination Feb 25, 2025
Copy link
Member

@odeimaiz odeimaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot 👌

@sanderegg sanderegg force-pushed the storage/add-pagination branch from a74a198 to aad652b Compare February 25, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:storage issue related to storage service a:webserver issue related to the webserver service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants