-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple Perf Improvements #80
base: master
Are you sure you want to change the base?
Multiple Perf Improvements #80
Conversation
613942a
to
778c09a
Compare
@@ -148,34 +148,6 @@ def batch_setup(region_name, run_id, vpc_id, securityGroupIds, computeEnvironmen | |||
boto_iam, boto_ec2, instanceRoleName, instanceRoleName) | |||
print("Using ECS instance profile %s" % instanceProfileArn) | |||
|
|||
# Create the spot fleet role | |||
# https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html | |||
spotIamFleetRoleName = "AmazonEC2SpotFleetRole" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be neat if we could think of a way to keep support for spot instance types for folks where Spot is cheaper than on-demand. On the other hand, it might be enough to point to this commit and say "undo this one".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can certainly do it, it's just some plumbing
@@ -205,7 +177,6 @@ def batch_setup(region_name, run_id, vpc_id, securityGroupIds, computeEnvironmen | |||
'cost_resource_group': run_id, | |||
}, | |||
bidPercentage=60, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be removed, too.
@@ -235,7 +242,7 @@ def __call__(self, parent): | |||
assert dz >= 0 | |||
width = 1 << dz | |||
|
|||
size = 0 | |||
sizes = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be easier to read. {}
always looks like an empty dict
to me, not a set.
sizes = {} | |
sizes = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually a dict! I need to track which tile has that size so I can combine them correctly at lower zooms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, well nevermind then. Maybe dict()
then? 😄
count_at_this_zoom = counts_at_zoom[z] | ||
zoom_10_equiv_count = count_at_this_zoom * (4 ** (10 - z)) | ||
counts_at_zoom_sum += zoom_10_equiv_count | ||
if counts_at_zoom_sum == 4**10: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this 10
be one of the zoom values that gets passed in? What happens when we want to switch away from zoom 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh. Yes definitely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it really just needs to be a number we're comfortable we're not going to go over. It could be zoom 20 and everything would be fine (except overflow?). At hardcoded zoom 10, if we ended up grouping into zoom 11 jobs then we'd be doing a 4^-1 calc which would then compare an int to a float, and things might get bad
@@ -290,24 +472,65 @@ def _big_jobs(rawr_bucket, prefix, key_format_type, rawr_zoom, group_zoom, | |||
|
|||
return big_jobs | |||
|
|||
def viable_container_overrides(mem_mb): | |||
""" | |||
Turns a number into the next highest even multiple that AWS will accept, and the min number of CPUs you need for that amount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning that this is a workaround for what seems to be a bug in Batch that prevents arbitrary memory/CPU requests.
|
||
# now that we know what we want, pick something AWS actually supports | ||
viable_mem_request, required_min_cpus = viable_container_overrides(adjusted_mem) | ||
print("REMOVEME: [%s] enqueueing %s at %s mem mb and %s cpus" % (time.ctime(), coord_line, viable_mem_request, required_min_cpus)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove REMOVEME? This looks like a useful thing to print maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are A LOT of these. 25k with this configuration
@@ -149,7 +151,7 @@ def missing_tiles_split(self, split_zoom, zoom_max, big_jobs): | |||
|
|||
self.read_metas_to_file(missing_meta_file, compress=True) | |||
|
|||
print("Splitting into high and low zoom lists") | |||
print("[%s] Splitting into high and low zoom lists" % (time.ctime())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefix the log with [make_meta_tiles]
too
@@ -162,25 +164,29 @@ def missing_tiles_split(self, split_zoom, zoom_max, big_jobs): | |||
|
|||
with gzip.open(missing_meta_file, 'r') as fh: | |||
for line in fh: | |||
c = deserialize_coord(line) | |||
if c.zoom < split_zoom: | |||
this_coord = deserialize_coord(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to rebase master
@@ -190,6 +196,7 @@ def missing_tiles_split(self, split_zoom, zoom_max, big_jobs): | |||
for coord in missing_high: | |||
fh.write(serialize_coord(coord) + "\n") | |||
|
|||
print("[%s] Done splitting into high and low zoom lists" % (time.ctime())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefix log with [make_meta_tiles]
help='If all the RAWR tiles grouped together are ' | ||
'bigger than this, split the job up into individual ' | ||
'RAWR tiles.') | ||
parser.add_argument('--allowed-missing-tiles', default=2, type=int, | ||
help='The maximum number of missing metatiles allowed ' | ||
'to continue the build process.') | ||
parser.add_argument('--tile-specifier-file', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this arg used?
|
||
return 0 | ||
|
||
def get_mem_reqs_mb(self, coord_str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the name mib
is probably more precise to mean 1024
or zoom_max (usually lower, e.g: 7) depending on whether big_jobs | ||
contains a truthy value for the RAWR tile. The big_jobs are looked up | ||
at zoom_max. | ||
High zoom jobs are output between split_zoom (RAWR tile granularity) and zoom_max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion, give a concrete example. As a new hire in the team, this doc is probably still not easy to follow
|
||
reordered_lines = tile_specifier.reorder(coord_lines) | ||
|
||
print("[%s] Starting to enqueue %d tile batches" % (time.ctime(), len(reordered_lines))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: log prefix of the module name
Look up the RAWR tiles in the rawr_bucket under the prefix and with the | ||
given key format, group the RAWR tiles (usually at zoom 10) by the job | ||
group zoom (usually 7) and sum their sizes. Return an ordered list of job coordinates | ||
by descending raw size sum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rawr
size sum
Provides the ability to sort tiles based on an ordering and specify memory reqs | ||
""" | ||
|
||
def __init__(self, default_mem_gb=8, spec_dict={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we probably should have a central place for all the default values such as 8 here.
-- these changes are reasonably well grouped by commit --
/usr/bin/time