Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
b0b1752
New aws jobstore.
DailyDreaming Oct 14, 2024
7dc1a7c
Update.
DailyDreaming Jan 24, 2025
06b7a40
Updates.
DailyDreaming Feb 3, 2025
4c12449
Linting.
DailyDreaming Mar 3, 2025
2900c99
Update.
DailyDreaming Mar 25, 2025
3a345dd
Update from master.
DailyDreaming Mar 25, 2025
9322285
Update.
DailyDreaming Mar 25, 2025
031bab4
Update and rebase.
DailyDreaming Mar 25, 2025
c58e025
Update and rebase.
DailyDreaming Mar 25, 2025
698b450
Assuage make docs's anger.
DailyDreaming Mar 25, 2025
b9f3cc8
Rebase.
DailyDreaming Mar 25, 2025
321a2c6
Some compat, some review comments.
DailyDreaming Mar 26, 2025
faf0581
Move boto imports.
DailyDreaming Mar 26, 2025
a9a8880
Update.
DailyDreaming Mar 27, 2025
082311d
Update comments, move imports, and update docstrings.
DailyDreaming Mar 31, 2025
0608622
Update imports.
DailyDreaming Apr 1, 2025
687b9b7
Merge branch 'master' into issues/964-aws-remove-sdb
DailyDreaming Apr 1, 2025
133fc26
Merge remote-tracking branch 'upstream/master' into issues/964-aws-re…
adamnovak Jul 24, 2025
0b2316b
Fix typing
adamnovak Jul 24, 2025
d778e1e
Enable type checking and fix utils typing
adamnovak Jul 24, 2025
aaf6085
Address code review comments, drop comments in docstrings, drop dupli…
adamnovak Jul 24, 2025
b99e462
Reformat and revise docs so docs build works
adamnovak Jul 24, 2025
56faabd
Merge remote-tracking branch 'upstream/master' into issues/964-aws-re…
adamnovak Jul 24, 2025
859dd2d
Quote possibly-unavailable types
adamnovak Jul 24, 2025
4631950
Make missing AWS modules produce ImportError and not NotImplementedError
adamnovak Jul 25, 2025
1f75eac
Stop trying to import the boto 2 error types
adamnovak Jul 25, 2025
3a643b7
Use the key function everywhere and deal with not having a log place …
adamnovak Jul 25, 2025
9e912b2
Add missing pre_update_hook call
adamnovak Jul 25, 2025
654d8e7
Stop logging every write
adamnovak Jul 25, 2025
cc0f8c4
Only write the marker when it moves
adamnovak Jul 25, 2025
7f3c3d2
Use the content key prefix when uploading files
adamnovak Jul 25, 2025
adef28a
Get executable bit from the end of the key fields
adamnovak Jul 25, 2025
1122742
Add --toil suffix to test bucket cleanup script
adamnovak Jul 25, 2025
14d1338
Make FileJobStore _write_to_url a classmethod again
adamnovak Jul 25, 2025
2126540
Stop tracking executability in key because it is tracked in the typed…
adamnovak Jul 25, 2025
3503b01
Merge remote-tracking branch 'upstream/master' into issues/964-aws-re…
adamnovak Jul 25, 2025
5fc4c39
Fix self reference in classmethod
adamnovak Jul 25, 2025
4a1de2e
Add pytest-randomly to report and set seeds only
adamnovak Jul 28, 2025
05c04fc
Fix removed method and enable AWS util test type checking
adamnovak Aug 5, 2025
7a3df6d
Move single-test teardown into test and fix argument type
adamnovak Aug 5, 2025
80aabc9
Fix typing by moving import
adamnovak Aug 5, 2025
47d2518
Merge remote-tracking branch 'upstream/master' into issues/964-aws-re…
adamnovak Aug 6, 2025
7822305
Satisfy MyPy on the pipes
adamnovak Aug 6, 2025
2d6d7ad
Respect turning off encryption for stream uploads so config.pickle ca…
adamnovak Aug 6, 2025
c369df4
Get the config from self
adamnovak Aug 6, 2025
bfc6616
Make AWS encryption settings update live from the config to satisfy test
adamnovak Aug 6, 2025
df02144
Handle error from trying to read without encryption, and default encr…
adamnovak Aug 6, 2025
86a8fe6
Make Bucket from the resource and not free-floating
adamnovak Aug 6, 2025
59b2eec
Satisfy MyPy
adamnovak Aug 6, 2025
278fb2f
Avoid depending on strongly-consistent clean in subTest tests
adamnovak Aug 7, 2025
4f55510
Raise correct nonexistent job store exception
adamnovak Aug 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ lint:
- ${MAIN_PYTHON_PKG} -m virtualenv venv && . venv/bin/activate && make prepare && make develop extras=[all]
- ${MAIN_PYTHON_PKG} -m pip freeze
- ${MAIN_PYTHON_PKG} --version
- make mypy
- make docs
- check-jsonschema --schemafile https://json.schemastore.org/dependabot-2.0.json .github/dependabot.yml
# - make diff_pydocstyle_report
Expand Down
13 changes: 12 additions & 1 deletion contrib/admin/mypy-with-ignore.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,17 @@ def main():
'src/toil/lib/aws/__init__.py',
'src/toil/server/utils.py',
'src/toil/test',
'src/toil/utils/toilStats.py'
'src/toil/utils/toilStats.py',
'src/toil/server/utils.py',
'src/toil/jobStores/aws/jobStore.py',
'src/toil/jobStores/exceptions.py',
'src/toil/lib/aws/config.py',
'src/toil/lib/aws/s3.py',
'src/toil/lib/retry.py',
'src/toil/lib/pipes.py',
'src/toil/lib/checksum.py',
'src/toil/lib/conversions.py',
'src/toil/lib/iterables.py'
]]

def ignore(file_path):
Expand All @@ -99,6 +109,7 @@ def ignore(file_path):
for file_path in all_files_to_check:
if not ignore(file_path):
filtered_files_to_check.append(file_path)
print(f'Checking: {filtered_files_to_check}')
args = ['mypy', '--color-output', '--show-traceback'] + filtered_files_to_check
p = subprocess.run(args=args)
exit(p.returncode)
Expand Down
2,603 changes: 641 additions & 1,962 deletions src/toil/jobStores/aws/jobStore.py

Large diffs are not rendered by default.

78 changes: 78 additions & 0 deletions src/toil/jobStores/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Copyright (C) 2015-2021 Regents of the University of California
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import urllib.parse as urlparse


class InvalidImportExportUrlException(Exception):
def __init__(self, url):
"""
:param urlparse.ParseResult url:
"""
super().__init__("The URL '%s' is invalid." % url.geturl())


class NoSuchJobException(Exception):
"""Indicates that the specified job does not exist."""
def __init__(self, jobStoreID):
"""
:param str jobStoreID: the jobStoreID that was mistakenly assumed to exist
"""
super().__init__("The job '%s' does not exist." % jobStoreID)


class ConcurrentFileModificationException(Exception):
"""Indicates that the file was attempted to be modified by multiple processes at once."""
def __init__(self, jobStoreFileID):
"""
:param str jobStoreFileID: the ID of the file that was modified by multiple workers
or processes concurrently
"""
super().__init__('Concurrent update to file %s detected.' % jobStoreFileID)


class NoSuchFileException(Exception):
"""Indicates that the specified file does not exist."""
def __init__(self, jobStoreFileID, customName=None, *extra):
"""
:param str jobStoreFileID: the ID of the file that was mistakenly assumed to exist
:param str customName: optionally, an alternate name for the nonexistent file
:param list extra: optional extra information to add to the error message
"""
# Having the extra argument may help resolve the __init__() takes at
# most three arguments error reported in
# https://github.com/DataBiosphere/toil/issues/2589#issuecomment-481912211
if customName is None:
message = "File '%s' does not exist." % jobStoreFileID
else:
message = "File '%s' (%s) does not exist." % (customName, jobStoreFileID)

if extra:
# Append extra data.
message += " Extra info: " + " ".join((str(x) for x in extra))

super().__init__(message)


class NoSuchJobStoreException(Exception):
"""Indicates that the specified job store does not exist."""
def __init__(self, locator):
super().__init__("The job store '%s' does not exist, so there is nothing to restart." % locator)


class JobStoreExistsException(Exception):
"""Indicates that the specified job store already exists."""
def __init__(self, locator):
super().__init__(
"The job store '%s' already exists. Use --restart to resume the workflow, or remove "
"the job store with 'toil clean' to start the workflow from scratch." % locator)
22 changes: 22 additions & 0 deletions src/toil/lib/aws/config.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all S3-related, so why aren't they in the S3 lib file?

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
S3_PARALLELIZATION_FACTOR = 8
S3_PART_SIZE = 16 * 1024 * 1024
KiB = 1024
MiB = KiB * KiB

# Files must be larger than this before we consider multipart uploads.
AWS_MIN_CHUNK_SIZE = 64 * MiB
# Convenience variable for Boto3 TransferConfig(multipart_threhold=).
MULTIPART_THRESHOLD = AWS_MIN_CHUNK_SIZE + 1
# Maximum number of parts allowed in a multipart upload. This is a limitation imposed by S3.
AWS_MAX_MULTIPART_COUNT = 10000


def get_s3_multipart_chunk_size(filesize: int) -> int:
"""Returns the chunk size of the S3 multipart object, given a file's size in bytes."""
if filesize <= AWS_MAX_MULTIPART_COUNT * AWS_MIN_CHUNK_SIZE:
return AWS_MIN_CHUNK_SIZE
else:
div = filesize // AWS_MAX_MULTIPART_COUNT
if div * AWS_MAX_MULTIPART_COUNT < filesize:
div += 1
return ((div + MiB - 1) // MiB) * MiB
Loading
Loading