Make the `AllowAll` and `DisallowAll` instances public #36

thewizardplusplus · 2022-07-16T23:29:09Z

It may be useful in the following cases:

Unit tests. Let's assume we are testing a function that filters a list of links with a previously parsed robots.txt file. How do we check the case when a robots.txt file disallows all links? Now it is possible only with actual parsing of the corresponding robots.txt file.
Errors not related to the network. Let's assume we are collecting a cache of parsed robots.txt files. What should we put in it in case of an error? Yes, the library has the useful functions analyze a status of a response, but not all errors are due to network problems (for example, let the cache accepts links as strings and parses them to construct a link to robots.txt files; this parsing may result in an error).
Support for temporary ignoring robots.txt files. Let's assume we want to allow a user to ignore robots.txt files through a special option. That is, allow all links. Now again, it is possible only with actual parsing of the corresponding robots.txt file.

I suppose it would be more comfortable to use the prepared public instances for allowing and disallowing all links.

temoto · 2022-07-17T13:19:09Z

You can't just make a *variable* public, because that allows accidental or malicious mutation outside of package. Current unit tests check that some agent is allowed on some path. I'm not sure that extracting (dis)allowall is beneficial. You would repeat part of group.test function in your code to check against new value. How is that better? Other arguments seem to want marshalling to save result of parsing and reuse it later. But what's wrong with just reparsing robots file again? It's pretty fast.

…

On Sun, Jul 17, 2022, 02:29 thewizardplusplus ***@***.***> wrote: It may be useful in the following cases: - *Unit tests.* Let's assume we are testing a function that filters a list of links with a previously parsed robots.txt file. How do we check the case when a robots.txt file disallows all links? Now it is possible only with actual parsing of the corresponding robots.txt file. - *Errors not related to the network.* Let's assume we are collecting a cache of parsed robots.txt files. What should we put in it in case of an error? Yes, the library has the useful functions analyze a status of a response, but not all errors are due to network problems (for example, let the cache accepts links as strings and parses them to construct a link to robots.txt files; this parsing may result in an error). - *Support for temporary ignoring robots.txt files.* Let's assume we want to allow a user to ignore robots.txt files through a special option. That is, allow all links. Now again, it is possible only with actual parsing of the corresponding robots.txt file. I suppose it would be more comfortable to use the prepared public instances for allowing and disallowing all links. ------------------------------ You can view, comment on, or merge this pull request online at: #36 Commit Summary - 1727854 <1727854> Make the `AllowAll` instance public - 60a6ab8 <60a6ab8> Make the `DisallowAll` instance public File Changes (1 file <https://github.com/temoto/robotstxt/pull/36/files>) - *M* robotstxt.go <https://github.com/temoto/robotstxt/pull/36/files#diff-67807f83711144695da19186cfc6d8f8df05669cb2640d5d926a2bf65aa0f50e> (12) Patch Links: - https://github.com/temoto/robotstxt/pull/36.patch - https://github.com/temoto/robotstxt/pull/36.diff — Reply to this email directly, view it on GitHub <#36>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGTMNSCNET5XHUIQWBAR3VUNAU7ANCNFSM53YZ3ISQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

thewizardplusplus · 2022-07-18T09:31:12Z

You can't just make a variable public, because that allows accidental or
malicious mutation outside of package.

This is common practice in Go, recall io.EOF or http.DefaultClient. The latter is the full equivalent of our case.

But you are right. I can suggest using functions that return these constants, such as NewAllowAll() and NewDisallowAll() (the names are up to you).

Current unit tests check that some agent is allowed on some path. I'm not
sure that extracting (dis)allowall is beneficial. You would repeat part of
group.test function in your code to check against new value. How is that
better?

Other arguments seem to want marshalling to save result of parsing and
reuse it later. But what's wrong with just reparsing robots file again?
It's pretty fast.

You are right, I can parse prepared strings every time I need those special cases. But I find it non-ideomatic: I need the special value, the library has that value, but private, and I have to parse the same string over and over again.

After all, why does the library have these constants? You could also parse prepared strings each time to get AllowAll and DisallowAll. That's fast. But you chose to use constants. Because it's obviously more comfortable (and faster).

temoto · 2022-07-18T13:58:03Z

After all, why does the library have these constants?

To save memory. Library can't save cpu/time because it needs to prove that incoming text is equivalent to popular constant. These constants do not have special meaning. They're just cache of common inputs.

…

On Mon, Jul 18, 2022, 12:31 thewizardplusplus ***@***.***> wrote: You can't just make a *variable* public, because that allows accidental or malicious mutation outside of package. This is common practice in Go, recall io.EOF or http.DefaultClient. The latter is the full equivalent of our case. But you are right. I can suggest using functions that return these constants, such as NewAllowAll() and NewDisallowAll() (the names are up to you). Current unit tests check that some agent is allowed on some path. I'm not sure that extracting (dis)allowall is beneficial. You would repeat part of group.test function in your code to check against new value. How is that better? Other arguments seem to want marshalling to save result of parsing and reuse it later. But what's wrong with just reparsing robots file again? It's pretty fast. You are right, I can parse prepared strings every time I need those special cases. But I find it non-ideomatic: I need the special value, the library has that value, but private, and I have to parse the same string over and over again. After all, why does the library have these constants? You could also parse prepared strings each time to get AllowAll and DisallowAll. That's fast. But you chose to use constants. Because it's obviously more comfortable (and faster). — Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAGTMMCS3AJ27Y3FI4CV2LVUUP6ZANCNFSM53YZ3ISQ> . You are receiving this because you commented.Message ID: ***@***.***>

temoto · 2022-07-18T22:24:45Z

I can parse prepared strings every time I need those special cases. But I find it non-ideomatic: I need the special value

This is where we don't agree yet.

Library doesn't use (dis)allowAll constants as special value. Maybe RobotsData.(Dis)AllowAll bool should be exported. Maybe via method RobotsData.SetAll(bool).

Let's imagine RobotsData supports Marshaler interface and result is stable between Go versions, like protobuf. Will that be ok or would you still want SetAll ?

codecov · 2022-07-18T22:28:07Z

Codecov Report

Merging #36 (60a6ab8) into master (a68aeca) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master      #36   +/-   ##
=======================================
  Coverage   92.59%   92.59%           
=======================================
  Files           3        3           
  Lines         405      405           
=======================================
  Hits          375      375           
  Misses         19       19           
  Partials       11       11

Impacted Files	Coverage Δ
robotstxt.go	`92.85% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a68aeca...60a6ab8. Read the comment docs.

thewizardplusplus added 2 commits July 17, 2022 01:21

Make the AllowAll instance public

1727854

Make the DisallowAll instance public

60a6ab8

temoto added the waiting-for-feedback Waiting for response from original poster or community. label Nov 9, 2022

thewizardplusplus closed this by deleting the head repository Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the `AllowAll` and `DisallowAll` instances public #36

Make the `AllowAll` and `DisallowAll` instances public #36

thewizardplusplus commented Jul 16, 2022

temoto commented Jul 17, 2022 via email

thewizardplusplus commented Jul 18, 2022

temoto commented Jul 18, 2022 via email

temoto commented Jul 18, 2022

codecov bot commented Jul 18, 2022

Make the AllowAll and DisallowAll instances public #36

Make the AllowAll and DisallowAll instances public #36

Conversation

thewizardplusplus commented Jul 16, 2022

temoto commented Jul 17, 2022 via email

thewizardplusplus commented Jul 18, 2022

temoto commented Jul 18, 2022 via email

temoto commented Jul 18, 2022

codecov bot commented Jul 18, 2022

Codecov Report

Make the `AllowAll` and `DisallowAll` instances public #36

Make the `AllowAll` and `DisallowAll` instances public #36