Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better rate limited RESTClient #452

Closed
wants to merge 45 commits into from
Closed

Conversation

Canx
Copy link
Collaborator

@Canx Canx commented May 18, 2024

With the new rate limited RESTClient class (PolygonClient) all current and future polygon API calls will be rate limited if needed. This was not done before for list_option_contracts, for example.

This also enables local testing with a free polygon key (works but slow). For local testing with free account you have to add an environment variable called POLYGON_IS_PAID_SUBSCRIPTION=False

@Canx Canx requested a review from grzesir as a code owner May 18, 2024 09:06
Copy link
Contributor

korbit-ai bot commented May 18, 2024

My review is in progress 📖 - I will have feedback for you in a few minutes!

Copy link
Contributor

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed your code and found 4 potential issues. To discuss my individual comments that I have added, tag me in replies using @korbit-ai.


Please react with a 👍 to my comments that you find helpful and a 👎 to those you find unhelpful - this will help me learn and improve as we collaborate.

@@ -18,6 +18,7 @@
# Global parameters
# API Key for testing Polygon.io
POLYGON_API_KEY = os.environ.get("POLYGON_API_KEY")
POLYGON_IS_PAID_SUBSCRIPTION = os.environ.get("POLYGON_IS_PAID_SUBSCRIPTION") or True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Readability and Maintainability

The environment variable 'POLYGON_IS_PAID_SUBSCRIPTION' is used as a boolean, but environment variables are always strings. This could lead to unexpected behavior if the environment variable is set to a string that is not 'True' or 'False'. Please convert the environment variable to a boolean by comparing the string to 'True' and 'False' and assigning the boolean value accordingly.

Comment on lines 473 to 484
def _get(self, *args, **kwargs):
if not self.paid:
print(
f"\nSleeping {self.seconds} seconds while getting data from Polygon because "
f"we don't want to hit the rate limit. IT IS NORMAL FOR THIS TEXT TO SHOW UP SEVERAL TIMES "
"and IT MAY TAKE UP TO 10 MINUTES PER ASSET while we download all the data from Polygon. The next "
"time you run this it should be faster because the data will be cached to your machine. \n"
"If you want this to go faster, you can get a paid Polygon subscription at https://polygon.io/pricing "
f"and set `polygon_has_paid_subscription=True` when starting the backtest.\n"
)
time.sleep(self.seconds)
return super()._get(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Readability and Maintainability

The _get method in the RLRESTClient class currently uses a print statement to warn about sleeping due to rate limits. This could lead to a cluttered console output and is not easily manageable. Consider using the logging module to output this message, which would allow for better control over when and how this message is displayed. Additionally, ensure that this message is only displayed when necessary to avoid repetition.

@@ -92,8 +89,8 @@ def get_price_data_from_polygon(
# print(f"\nGetting pricing data for {asset} / {quote_asset} with '{timespan}' timespan from Polygon...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Logging

The RLRESTClient class should not print messages directly to the console. Instead, it should use a logging framework to allow better control over the output and to integrate with other systems that may consume these logs.

@@ -480,3 +462,23 @@ def update_polygon_data(df_all, result):
df_all = df_all[~df_all.index.duplicated(keep="first")] # Remove any duplicate rows

return df_all

class RLRESTClient(RESTClient):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Documentation

The removal of the global POLYGON_QUERY_COUNT variable and WAIT_TIME constant is a good step towards encapsulation, but the new RLRESTClient class should document its behavior and usage, especially regarding the rate limiting functionality.

@grzesir
Copy link
Contributor

grzesir commented May 20, 2024

@Canx This is great! Let me know if it's ready for review/merging

"""
Initialize the PolygonClient with rate limiting.
"""
self.seconds = 12
Copy link
Collaborator

@jimwhite jimwhite May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually better to give these kind constants names at the top of the so folks are aware is used in this file rather than having to track it down.

Also the simplicity here is nice but we still have the problem of concurrency to deal with because of these pools. You could have a global var that holds the timestamp of the last access.

with RLRC_LOCK:
    now = precise clock timestamp 
    elapsed_time  = now - RLRC_LAST_TIME
    RLRC_LAST_TIME = now
if elapsed_time < RLRC_MIN_ELAPSED:
   sleep RLRC_MIN_ELAPSED - elapsed_time

Of you could slip the finessing and skip the timestamp and just put sleep inside the lock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your right! Made a WAIT_SECONDS constant and now init is not needed.

Ok, I need to understand more about this locking problem... At first try chatgpt is proposing to use a semaphore. I'll see what I can do!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking of doing a singleton, in tests I've seen more than one RESTClient instance and maybe that could be a problem...

@Canx
Copy link
Collaborator Author

Canx commented May 21, 2024

I'm going to close this PR as I created the same one from the repo #455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants