Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider factoring in meta refresh tags when calculating redirects #52

Open
konklone opened this issue Feb 7, 2017 · 2 comments
Open

Comments

@konklone
Copy link
Collaborator

konklone commented Feb 7, 2017

Not necessarily for relaxing compliance standards around using server-side 80->443 redirects, but just to detect a broader swathe of agency behavior.

For example, segurosocial.gov seems to redirect to socialsecurity.gov, but it actually uses a <meta> tag to do the refresh. And further, it redirects to an insecure URL:

curl https://segurosocial.gov
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>SEGUROSOCIAL</TITLE>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type>
<META content="MSHTML 5.00.2314.1000" name=GENERATOR>
<META HTTP-EQUIV="refresh" CONTENT="0; URL=http://www.socialsecurity.gov/espanol">
</HEAD>
<BODY aLink=#ff0000 bgColor=#ffffff link=#000ff text=#000000 vLink=#0000ff>
</BODY></HTML>

However, this doesn't show up in pshtt at all, so there's no way to detect this kind of thing.

It'd be a new thing to look at (and parse) HTML content instead of just HTTP headers and status codes, but if it's simple enough, it may be worth it, and offering a new field or set of fields (separate from the fields there now for server redirects) for downstream tools who care about them.

@garrettr
Copy link
Contributor

garrettr commented Feb 7, 2017

However, this doesn't show up in pshtt at all, so there's no way to detect this kind of thing.

True! There are also other redirect techniques beyond meta redirects that pshtt currently can't recognize: for example, https://abcnews.go.com uses Javascript to downgrade HTTPS:

<script>
        if (window.location.protocol == "https:" && window.parent.location.hostname.indexOf("outbrain") == -1) {
                var _sslurl = window.location.href.replace("https://", "http://");
                window.location.replace(_sslurl);
                window.location.href = _sslurl;
        }
</script>

I think the most comprehensive approach would be to use browser automation - "it's the only way to be sure." On the other hand, while that would make it easy to determine whether a site downgrades HTTPS or not, it wouldn't automatically help with the harder problem of determining why/how a site downgrades.

If you want to keep this issue specifically about meta redirects, let me know, and I'll move this comment to a dedicated issue about detecting JS redirects.

@konklone
Copy link
Collaborator Author

konklone commented Feb 8, 2017

The main reason I was considering meta redirects as possible is because in theory we should already have the HTML content from our requests to the site, and no more network activity is necessary. We'd only need to run an HTML parse operation on the retrieved content.

To do JS redirect detection would require (as you say) a headless browser, and potentially more network requests if the relevant JS is brought in via an external file and not an inline script. While HTML parsing isn't trivial, operating a headless browser and making arbitrary additional network requests is less appealing to me.

No worries on discussing it all in this issue, IMO.

@konklone konklone changed the title Consider checking for meta refresh redirects Consider factoring in meta refresh tags when calculating redirects Aug 25, 2017
@konklone konklone added the WSC label Aug 25, 2017
@hillaryj hillaryj removed the WSC label Dec 5, 2020
mcdonnnj pushed a commit that referenced this issue Mar 9, 2022
Workflow improvements: 🦁 set-env, and 🐯 python-3.9, and 🐻 dependabot, oh my!
mcdonnnj added a commit that referenced this issue Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants