Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a feature to report all references to a URL [cherry-pick] #685

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cjmayo
Copy link
Contributor

@cjmayo cjmayo commented Oct 27, 2022

This is 2a9ec3f from #663 cherry-picked with the following resolved:

--- a/linkcheck/checker/urlbase.py
+++ b/linkcheck/checker/urlbase.py
@@@ -397,9 -384,14 +399,18 @@@ class UrlBase
              self.info.append(s)
  
      def set_cache_url(self):
-         """Set the URL to be used for caching."""
+         """Set the URLs to be used for caching."""
+ 
          if "AnchorCheck" in self.aggregate.config["enabledplugins"]:
++<<<<<<< HEAD
 +            self.cache_url = self.url
++=======
+             log.debug(
+                 LOG_CHECK,
+                 "set_cache_url: self.url: %s; self.anchor: %s; self.urlparts[4]: %s",
+                 self.url, self.anchor, self.urlparts[4])
+             self.result_cache_url = self.url
++>>>>>>> 2a9ec3fe (Add a feature to report all references to a URL)
          else:
              # remove anchor from cached target url since we assume
              # URLs with different anchors to have the same content
diff --cc linkcheck/data/linkcheckerrc
index cd1d0765,95013c74..00000000
--- a/linkcheck/data/linkcheckerrc
+++ b/linkcheck/data/linkcheckerrc
@@@ -9,21 -9,17 +9,27 @@@
  # print status output
  #status=1
  # change the logging type
 -#log=xml
 +#log=text
  # turn on/off --verbose
 -#verbose=1
 +#verbose=0
  # turn on/off --warnings
++<<<<<<< HEAD
 +#warnings=1
++=======
+ #warnings=0
+ # turn on/off --allrefs
+ #reportallreferences=1
++>>>>>>> 2a9ec3fe (Add a feature to report all references to a URL)
  # turn on/off --quiet
 -#quiet=1
 -# additional file output
 +#quiet=0
 +# additional file output, example:
  #fileoutput = text, html, gml, sql
 +# errors to ignore (URL regular expression, message regular expression)
 +#ignoreerrors=
 +# ignore all errors for broken.example.com:
 +#  ^https?://broken.example.com/
 +# ignore SSL errors for dev.example.com:
 +#  ^https://dev.example.com/ ^SSLError .*
  
  
  ##################### logger configuration ##########################

Often when there is a broken URL, e.g. from documentation changes that
moved a file or changed an anchor, there are many other pages that
reference that URL. Before this commit, linkchecker would only report
the first reference to that broken URL. With this commit, linkchecker is
now able to optionally report all references to that broken url.

Command line argument: --allrefs

Config setting: output.reportallreferences

It is off by default, mainly because I didn't want to risk breaking
existing behavior people might expect, and it would take a lot of
existing-test overhaul. There is very little performance penalty to
enabling it.

I intentionally built this on top of my other AnchorCheck code, because
I need it to work with anchor checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants