Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added multi root queries to the model #84

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

carlosvdr
Copy link
Contributor

Issue: #43

@carlosvdr carlosvdr self-assigned this Jul 19, 2023
Copy link
Contributor

@aasseman aasseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a first pass, I'll review again tomorrow.

autoagora/logs_db.py Outdated Show resolved Hide resolved
autoagora/logs_db.py Outdated Show resolved Hide resolved
autoagora/config.py Outdated Show resolved Hide resolved
autoagora/config.py Outdated Show resolved Hide resolved
hash: bytes
query: str
query_time_ms: int = 0
timestamp: datetime = datetime.now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp will end up being a constant set at the moment the program starts (when the class is created), not when a new instance of this class is created.

autoagora/logs_db.py Outdated Show resolved Hide resolved
autoagora/model_builder.py Outdated Show resolved Hide resolved
autoagora/model_builder.py Outdated Show resolved Hide resolved
autoagora/model_builder.py Outdated Show resolved Hide resolved
autoagora/model_builder.py Outdated Show resolved Hide resolved
autoagora/main.py Outdated Show resolved Hide resolved
model = await mrq_model_builder(subgraph, pgpool)
await set_cost_model(subgraph, model)
# Should this also be this amount of time?
await aio.sleep(args.relative_query_costs_refresh_interval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a different time, otherwise it would be too slow.
Also consider using random.lognormvariate: https://www.desmos.com/calculator/k3en6o5ikf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a good "default" value and apply lognormvariate to it, to give it some "randomness"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values in the link should be a good start

@@ -52,7 +179,7 @@ async def get_most_frequent_queries(
Avg(query_time_ms) as avg_time,
Stddev(query_time_ms) as stddev_time
FROM
query_logs
{query_logs_table}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aasseman So this part, didn't let me to either leave it as %s and insert values or using $1, since the syntax doesn't work if the variable is alone like that.
Had to add format to it and use it this way, however this is vulnerable and SQL injections this way are a possibility, although this variable is not accesible by the user, what do you think?
Should I then create a separate query so that we don't have that there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not great. There is a SQL solution mentionned here. It also mentions that psycopg2 has a solution for this, but it's not async.
However, it seems that psycopg3 came out recently, and it supports async! Not sure if it's worth/hard porting to it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looked further into this, the easiest quickest approach is to just separate the function into two or to have in the constanst file the query already defined and select the query depending the type of query its looking for , which should be the one that leaves the code the "cleanest"

@aasseman aasseman linked an issue Nov 6, 2023 that may be closed by this pull request
@carlosvdr carlosvdr marked this pull request as ready for review November 24, 2023 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add handling of multi-root queries latency measurement
2 participants