-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the AllowAll
and DisallowAll
instances public
#36
Make the AllowAll
and DisallowAll
instances public
#36
Conversation
You can't just make a *variable* public, because that allows accidental or
malicious mutation outside of package.
Current unit tests check that some agent is allowed on some path. I'm not
sure that extracting (dis)allowall is beneficial. You would repeat part of
group.test function in your code to check against new value. How is that
better?
Other arguments seem to want marshalling to save result of parsing and
reuse it later. But what's wrong with just reparsing robots file again?
It's pretty fast.
…On Sun, Jul 17, 2022, 02:29 thewizardplusplus ***@***.***> wrote:
It may be useful in the following cases:
- *Unit tests.* Let's assume we are testing a function that filters a
list of links with a previously parsed robots.txt file. How do we
check the case when a robots.txt file disallows all links? Now it is
possible only with actual parsing of the corresponding robots.txt file.
- *Errors not related to the network.* Let's assume we are collecting
a cache of parsed robots.txt files. What should we put in it in case
of an error? Yes, the library has the useful functions analyze a status of
a response, but not all errors are due to network problems (for example,
let the cache accepts links as strings and parses them to construct a link
to robots.txt files; this parsing may result in an error).
- *Support for temporary ignoring robots.txt files.* Let's assume we
want to allow a user to ignore robots.txt files through a special
option. That is, allow all links. Now again, it is possible only with
actual parsing of the corresponding robots.txt file.
I suppose it would be more comfortable to use the prepared public
instances for allowing and disallowing all links.
------------------------------
You can view, comment on, or merge this pull request online at:
#36
Commit Summary
- 1727854
<1727854>
Make the `AllowAll` instance public
- 60a6ab8
<60a6ab8>
Make the `DisallowAll` instance public
File Changes
(1 file <https://github.com/temoto/robotstxt/pull/36/files>)
- *M* robotstxt.go
<https://github.com/temoto/robotstxt/pull/36/files#diff-67807f83711144695da19186cfc6d8f8df05669cb2640d5d926a2bf65aa0f50e>
(12)
Patch Links:
- https://github.com/temoto/robotstxt/pull/36.patch
- https://github.com/temoto/robotstxt/pull/36.diff
—
Reply to this email directly, view it on GitHub
<#36>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGTMNSCNET5XHUIQWBAR3VUNAU7ANCNFSM53YZ3ISQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
This is common practice in Go, recall But you are right. I can suggest using functions that return these constants, such as
You are right, I can parse prepared strings every time I need those special cases. But I find it non-ideomatic: I need the special value, the library has that value, but private, and I have to parse the same string over and over again. After all, why does the library have these constants? You could also parse prepared strings each time to get |
After all, why does the library have these constants?
To save memory.
Library can't save cpu/time because it needs to prove that incoming text is
equivalent to popular constant.
These constants do not have special meaning. They're just cache of common
inputs.
…On Mon, Jul 18, 2022, 12:31 thewizardplusplus ***@***.***> wrote:
You can't just make a *variable* public, because that allows accidental or
malicious mutation outside of package.
This is common practice in Go, recall io.EOF or http.DefaultClient. The
latter is the full equivalent of our case.
But you are right. I can suggest using functions that return these
constants, such as NewAllowAll() and NewDisallowAll() (the names are up
to you).
Current unit tests check that some agent is allowed on some path. I'm not
sure that extracting (dis)allowall is beneficial. You would repeat part of
group.test function in your code to check against new value. How is that
better?
Other arguments seem to want marshalling to save result of parsing and
reuse it later. But what's wrong with just reparsing robots file again?
It's pretty fast.
You are right, I can parse prepared strings every time I need those
special cases. But I find it non-ideomatic: I need the special value, the
library has that value, but private, and I have to parse the same string
over and over again.
After all, why does the library have these constants? You could also parse
prepared strings each time to get AllowAll and DisallowAll. That's fast.
But you chose to use constants. Because it's obviously more comfortable
(and faster).
—
Reply to this email directly, view it on GitHub
<#36 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGTMMCS3AJ27Y3FI4CV2LVUUP6ZANCNFSM53YZ3ISQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This is where we don't agree yet. Library doesn't use (dis)allowAll constants as special value. Maybe Let's imagine |
Codecov Report
@@ Coverage Diff @@
## master #36 +/- ##
=======================================
Coverage 92.59% 92.59%
=======================================
Files 3 3
Lines 405 405
=======================================
Hits 375 375
Misses 19 19
Partials 11 11
Continue to review full report at Codecov.
|
It may be useful in the following cases:
robots.txt
file. How do we check the case when arobots.txt
file disallows all links? Now it is possible only with actual parsing of the correspondingrobots.txt
file.robots.txt
files. What should we put in it in case of an error? Yes, the library has the useful functions analyze a status of a response, but not all errors are due to network problems (for example, let the cache accepts links as strings and parses them to construct a link torobots.txt
files; this parsing may result in an error).robots.txt
files. Let's assume we want to allow a user to ignorerobots.txt
files through a special option. That is, allow all links. Now again, it is possible only with actual parsing of the correspondingrobots.txt
file.I suppose it would be more comfortable to use the prepared public instances for allowing and disallowing all links.