Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieval Event DB #873

Open
jcace opened this issue Jan 13, 2023 · 2 comments
Open

Retrieval Event DB #873

jcace opened this issue Jan 13, 2023 · 2 comments
Assignees
Labels
New Feature Issues that we will work on with people or ourselves

Comments

@jcace
Copy link
Contributor

jcace commented Jan 13, 2023

We need a database to store the stream of retrieval event data that we use to compute reputation scores from. We need to define (1) the schema for the db, and (2) the underlying database technology to use

Bedrock has defined a great schema that we can base this off: https://www.notion.so/Retrieval-Reputation-Schema-edcf4e8b89674343a45f62215c6e6ea9

Database Technologies

First option - Pando

Pando is a custom database solution designed for Filecoin reputation data. Bedrock is planning to use it to store retrieval statistics data which will be used to compute reputation.

Explore how we can integrate Autoretrieve stats into Pando. Investigate what it looks like to push data in / pull data out

Second option - Event DB built on top of Postgres

Third Option - Timeseries DB

TimescaleDB (Postgres) https://github.com/timescale/timescaledb
InfluxDB (open-source time-series DB) https://github.com/influxdata/influxdb

@jcace jcace added the New Feature Issues that we will work on with people or ourselves label Jan 13, 2023
@jcace jcace added this to the Incentivized Retrievals Q1 milestone Jan 13, 2023
@jcace jcace self-assigned this Jan 13, 2023
@jcace jcace changed the title Prove out Pando DB Retrieval Event DB Jan 13, 2023
@jcace
Copy link
Contributor Author

jcace commented Jan 17, 2023

Just discovered we already have a database called estuary-metrics. This could serve as a nice place to store all these raw metrics:

I think it might make sense to tweak the schema of estuary-metrics : remove retrieval_success_records and retrieval_failure_records , and instead combine them into a single retrieval_events table. This new table would look mostly like the retrieval_success_records , with a flag for failed to capture the failure events.

Since we need both success/failure counts in our reputation calculation, I think this would make it quite ergonomic for us. We could query it once (aggregate by matching sp, in a given timestamp window),

@jcace
Copy link
Contributor Author

jcace commented Jan 17, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Feature Issues that we will work on with people or ourselves
Projects
Status: In Progress
Development

No branches or pull requests

1 participant