-
Notifications
You must be signed in to change notification settings - Fork 37
Adds first scenario for feature engineering examples #311
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start -- I don't think this is going to be clear to most people who haven't really dug into this. A few thoughts:
- We can clarify the wording/make it crisper to specify why this is a problem, how its normally done, and why hamilton alleviates this
- We can give more context about what we're doing here/why its in an online context
- We can root on tooling that might be familiar to them. While loading fake models/whatnot makes sense, I think its going to confuse the users. So either load from a model/feature store they're used to, or (more likely) abstract it away and make it very clear that it could be implemented in many different ways.
This stuff is natural to us as we've been building online/batch inference/training tooling for years, but I think this will be extremely complex to most people out there, and fall flat. Hamilton is simple enough and makes this easy enough that this is a good chance to capture market share, but to do so we need to really hammer home a pattern and a motivation.
This example shows how you can use the same feature definitions in Hamilton in an offline setting and use them in an online setting. Assumptions: - the API request can provide the same raw data that training provides. - if you have aggregation features, you need to store the training result for them, and provide them to the online side.
This example shows how one might use Hamilton to compute features in an offline and online fashion. The assumption here is that the request passed into the API has all the raw data required to compute features. This example also shows how one might "override" some values that are required for computing features, in this example they are `age_mean` and `age_std_dev`. This can be required when you computing aggregation features does not make sense at inference time.
That's the point of the scenarios, there is no one size fits all. That is, show the simplest possible thing, then one where there is a feature store, etc. Will add more to motivation -- and draw some pictures. |
I think that this makes things clearer what this file is, and is a lightweight way to register feature sets that are used by a model.
36799a9
to
ed55ed9
Compare
To help set the tone and explain what feature engineering is, as well as more context about the scenarios and the task.
ed55ed9
to
f175938
Compare
I expand on the docs, and hopefully explain it a bit more that is understandable to a novice.
As a way to show functionality that can be used to highlight that they should be overridden in an online setting.
This example shows how you can use the same feature definitions in Hamilton in an offline setting and use them in an online setting.
Assumptions:
Changes
How I tested this
Notes
Checklist