Replies: 2 comments
-
Copying some discord chat with @davidgasquez over:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Here's a reworked and hopefully similar set of steps:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a thread on some of the metrics modeling discussions we've been having, as we move from a small number of static metrics to a large number of metrics that can be applied on a timeseries.
Metric schema
@ryscheng proposed the following as a
metrics_v0
yesterday:Sample metrics
Here are some examples of different types of metrics:
Gas Fees
This is a static metric that simply sums gas fees by project / event_source / time_interval. (The
event_source
represents the chain and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)Contributors
This is another static metric that counts the unique number of contributors by project / event_source / time_interval. (The
event_source
will always be GitHub for now and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)New Contributors
This is a more complex static metric that calculates the number of new contributors by project / event_source / time_interval. (The
event_source
will always be GitHub for now and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)Bus Factor
This is an even more complex static metric that does some math on the composition of contributors by project / event_source / time_interval. (The
event_source
will always be GitHub for now and thetime_interval
options are "7 DAYS", "30 DAYS", ... , "ALL".)Full-time active developers
This is v0 timeseries metric that counts the number of developers that have made 10+ commits in a 30 day period to a project. It constructs a synthetic calendar and applies a 30 day rolling window.
Transformation Steps
For each of these metrics, there appears to be a general pattern of transformation steps:
0. From staging to raw events
Currently, the
int_events
table has both raw events (eg,COMMIT_CODE
) and bucket events (eg,CONTRACT_INVOCATION_SUCCESS_DAILY_COUNT
). Theint_events
table also has fields that are not strictly necessary (eg,to_artifact_name
,to_artifact_type
).A proposal would be to remove all the superfluous fields and just have:
time, from_artifact_id, to_artifact_id, event_source, event_type, amount
Then, we should keep the raw times instead of bucketed ones, eg,
CONTRACT_INVOCATION_SUCCESS
with a specific timestamp.One downside is this will magnify the amount of events we have, ie, a token transfer could have events for gas, contract_invocation, usd_amount, donation, etc.
1. Filtering events
All events have a filtering step which could easily be parametrized in the metric definition, eg:
These could be expanded upon to include both types (set by the event source provider) and tags (set by different models, eg, from_artifact_ids associated with trusted farcaster users).
2. Deriving intermediate metrics
Once events have been filtered, there is usually a step where some intermediation transformation is needed. For instance:
gas_fees / 1e18
cast(amount > 0 as int64)
case when user_stats.first_day >= time_intervals.start_date then events.from_artifact_id end
This is usually an important part of the business logic.
3. Building a timeseries
For metrics that have rolling windows (eg,
fulltime_developers
), it may be necessary to create a utility calendar and add ephemeral events with 0 amounts. There's some logic around defining awindow_interval
and asampling_interval
, eg:An alternative implementation for a related metric might be:
4. Aggregating by entity type and applying remaining business logic
We should avoid having to define every metric for every artifact / project / collection. Thus, we'd like some generalized version of:
... where the
to_id
could be aproject_id
orcollection_id
.Then we perform our remaining business logic operations.
For example, with
bus_factor
we have:5. Agg functions
Finally, we can apply standard agg and limit functions to the raw metric models. These will mostly be
min
,max
,avg
,std
, andlimit 1
since thesum
andcount
/count_unique
agg funcs will already have been be done upstream.Curious what @ryscheng @ravenac95 think!
Beta Was this translation helpful? Give feedback.
All reactions