-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add headers compliant with Wikimedia's user-agent policy #44
base: main
Are you sure you want to change the base?
Add headers compliant with Wikimedia's user-agent policy #44
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! I left my thoughts on your parch. Could you adjust few lines?
@@ -114,6 +117,13 @@ def __init__(self, | |||
pass | |||
opener = urllib.request._opener # type: ignore | |||
assert isinstance(opener, urllib.request.OpenerDirector) | |||
if not user_agent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client()
constructor already takes opener: Optional[urllib.request.OpenerDirector]
, and OpenerDirector
allows to override its default User-Agent
header. According to the docs:
OpenerDirector
automatically adds aUser-Agent
header to everyRequest
. To change this:import urllib.request opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] opener.open('http://www.example.com/')
Therefore, IMHO it'd be better to remove user_agent
parameter (as a user can customize its User-Agent
by passing their own instance of OpenerDirector
1) and instead check if it's using the default opener or a passed custom opener
:
if not user_agent: | |
if self._using_default_opener: |
Footnotes
-
Of course, such usage might be difficult to guess. Might be good to have few lines of example code in the docs. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd rather go with this approach, then feel free to close this PR as I'm not sure where you'd like this documented. 🙏🏻
My view on this is that even if it was left to the end-user to configure their own OpenerDirector
, (1) this feels like the library doesn't help do things the "right way" for wikidata.org nor the end-user, and (2) this feels like you'd at least want to validate the configured User-Agent
against some regex and otherwise raise a ValueError
or so.
But it is only my view. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like the library doesn't help do things the "right way" for wikidata.org nor the end-user
Fair enough, but it seems to make no difference to me if Client()
constructor has user_agent
parameter, as it's just a free-form text to the end-user. Rather, my intention was to incline end-users to stick with the provided default User-Agent
string which complies with Wikidata's official guidance.
Please let me know if I misread your point. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that if users need to create a custom OpenerDirectory
to set a compliant header, it is good that the wikidata
library allows this custom object to be injected 👍🏻 , but it could be even better (it could improve usability) if it was providing a compliant default OpenerDirectory
/header in first place.
If by:
the provided default
User-Agent
you mean the default from urllib
, i.e. [('User-agent', 'Python-urllib/3.9')]
, then it does not comply to Wikidata's official guidance, which is why there is issue #43.
If by:
the provided default
User-Agent
you mean some default that the wikidata
library (this repo's library) would provide, then it looks like we both share the intent of helping end-users to use a header which complies with Wikidata's policy, but I haven't understood how you'd like to provide this default to end-users in practice. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the latter.
End-users would probably be unaware of Wikidata's official guidance. Therefore, even if Client()
has an option to customize their User-Agent
, their User-Agent
strings are still prone to ignore Wikidata's policy.
IMHO it's better not to provide any easier way to customize User-Agent
header, and instead to stir up end-users to stick with our predefined User-Agent
string which complies with Wikidata's policy. I guess the most of end-users have no demand to customize the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I agree with you! 💯 👍🏻
This is why, in this PR, I chose to default User-Agent
to:
f'wikidata-based-bot/{__version__} (https://github.com/dahlia/wikidata) python/{python_version}'
unless specified otherwise by the end-user, i.e.:
def __init__(self,
# [...]
user_agent: Optional[str] = None) -> None:
# [...]
if not user_agent:
python_version = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}'
user_agent = f'wikidata-based-bot/{__version__} (https://github.com/dahlia/wikidata) python/{python_version}'
# See also: https://meta.wikimedia.org/wiki/User-Agent_policy
opener.addheaders = {
'User-Agent': user_agent,
}
This way, by default, the provided User-Agent
is always compliant with Wikidata's policy.
Are you happy with this approach? 🤔
-
If so, is there anything else you'd like in this PR?
-
If not, how would you rather approach it? Like this? (I.e. no way to provide
user_agent
?)def __init__(self, # [...] repr_string: Optional[str] = None) -> None: # [...] python_version = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}' user_agent = f'wikidata-based-bot/{__version__} (https://github.com/dahlia/wikidata) python/{python_version}' # See also: https://meta.wikimedia.org/wiki/User-Agent_policy opener.addheaders = { 'User-Agent': user_agent, }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the second approach! IMHO we don't need to provide such customization. If end-users request such feature in the future, I'm going to consider to provide it by that time. 🤔
Prior to this commit, when sending several requests in a row, some requests could fail with `429/Too many requests` as the client was not providing an `User-Agent` header compliant with Wikimedia's user-agent policy. See also: https://meta.wikimedia.org/wiki/User-Agent_policy
c803821
to
79dde4b
Compare
Fixes #43.
Changelog
user_agent
argument toClient
's constructor.user_agent
to a value compliant with Wikimedia's user-agent policy.user_agent
as the value for theUser-Agent
HTTP header.