Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of scenario interface. #70

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

josephburnett
Copy link
Contributor

An implementation of #62. Input to a simulation run in a Scenario message. Output is a Result message.

Copy link
Contributor

@jchesterpivotal jchesterpivotal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hard objections, but I had some questions.

proto/scenario.proto Show resolved Hide resolved

message LoadInterval {
int32 width_milliseconds = 1;
int32 amplitude_aps = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APS? Is this a typo for QPS (the google shorthand as I understand it)?

If so, should it be in per-second units? I can see that it's easier to approach from how folks usually express load, but it also means that we'll be internally doing a int32 / int32 to derive the correct value and will need to be careful of overflows into (I presume) float64.

I feel like it would be easier to give a total number of load units (requests, queries). That does put it back on the user to perform a calculation but it means that they don't have to perform any calculations to recover the total number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was calling it "arrivals per second" because I want to remain open to pub-sub use-cases. But "operations per second" would also work. Would that be more intuitive?

(QPS - "queries per second" is probably not the best choice)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think you're right. Arrivals/sec is a better unit. Perhaps rename as amplitude_arrivals_per_second, arrivals_per_sec or similar.

}

message Workload {
int32 readiness_delay_milliseconds = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be applied uniformly to every copy of a workload? Or for each instance? At the moment Skenario assumes fixed, identical startup times but I'd always wanted to replace that with something more realistic down the road.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

I would like to leave room for a statistical readiness delay model. So perhaps I should put this in a struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to struct.

proto/scenario.proto Show resolved Hide resolved
repeated MetricPoint point = 3;
}

enum MetricType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we call this MetricStatisticType or MetricSummaryType.

proto/scenario.proto Show resolved Hide resolved
syntax = "proto3";
package proto;

message Scenario {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we document each field? (there are no comments)

I'd suggest erring on the side of each field having a comment, even if some seem more clear. It will help follow what is what (e.g. 'attack', 'sustain', 'decay')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely.

}

message LoadConstant {
int32 arrivals_per_second = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between 'arrival' and 'load'? Would it make sense to stick to one term, e.g. load_rate / load_per_second ?

Alternatively, I'd go with query_per_second nomenclature, as QPS is a better known term (I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an "arrival" is a "request". But I wanted to avoid http oriented language to allow for pub-sub use cases. "load" is the pattern of arrival / request.

How about sticking to the terms "load" and "operation"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen QPS being used in pubsub context, so I wouldn't worry about that. That said, I don't feel strongly, if you prefer 'arrivals', that sounds fine to me.

LoadInterval interval = 1;
}

message ShapeAsdr {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ASDR case so common and helpful? Couldn't I express the same by putting a bunch of LoadInterval into LoadCustom? (seems simpler and more generic to me, and I wouldn't be constrained to ASDR model, and I couldn't abuse ASDR model by defining, for example, decay higher than attack)

P.S. Is it ASDR or ADSR? Wikipedia calls it ADSR per Jacques's link: https://en.wikipedia.org/wiki/Envelope_(music).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was shooting for a middle ground between a constant value and fully custom values. ADSR is periodic, so you define the wave shape and it repeats. It's nice for describing something simple like "I have a spike every hour on the hour."

With custom, you have a lay out as many points as the simulation is long. This is the escape hatch for all load patterns we can't express.

ADSR subsumes step and ramp patterns currently in Skenario.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If it's periodic, I can see how it can be more convenient than custom.


message LoadPeriodic {
oneof shape_oneof {
ShapeSine sine = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be sure how to interpret semantics of Sine and Asdr, e.g. how exactly should I interpret width_seconds / amplitude for Sin, will they repeat, for how long etc. I hope comments will clarify these things.

P.S. Might be better to define Sine fields directly for clarity, rather than trying to reuse LoadInterval structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it seems awkward. I'll change sine to a frequency / amplitude struct. Or I could drop it entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sine is still a useful approximation for load patterns with seasonality (which is basically all of them). It has the nice property that you can more easily look for resonant frequencies at which an autoscaler misbehaves.

int32 memory_bytes = 3;
}

message Cluster {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also skip for now, per https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it We can always add later once we know what exactly we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touche. I'll drop it like a hot potato.

message Metric {
MetricType type = 1;
MetricResource resource = 2
repeated MetricPoint point = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by combination of statistics (average, P50, P90) and time series ("repeated MetricPoint"). For example, what would it mean to result 10 MetricPoint with type P99?

Also, how does the system know which statistics are requested?

The way I'd see it is that you should only have a timeseries in the result, maybe with requested resolution. The caller can then do whatever they want with these timeseries, e.g. draw, compute custom percentiles etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MetricPoints vary the time and value. For example P50 latency might be 1 second at T0, 2 seconds at T1, etc... Metric is a typed metric stream covering the duration of the simulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants