Skip to content

Crawl a site and check various health indicators

License

Notifications You must be signed in to change notification settings

buren/site_health

Repository files navigation

SiteHealth Build Status

⚠️ Project is still experimental, API will change (a lot) without notice.

Crawl a site and check various health indicators, such as:

  • Server errors
  • HTTP errors
  • Invalid HTML/XML/JSON
  • Missing HTML title/description
  • Missing image alt-attribute
  • Google Pagespeed

Installation

Add this line to your application's Gemfile:

gem "site_health"

And then execute:

$ bundle

Or install it yourself as:

$ gem install site_health

Usage

CLI usage.

Crawl and check site

nurse = SiteHealth.check("https://example.com")

Check list of URLs

nurse = SiteHealth.check_urls(["https://example.com"])

Write raw JSON result to file

nurse = SiteHealth.check("https://example.com")
json = JSON.pretty_generate(nurse.journal)

File.write("result.json", json)

Each issue

SiteHealth.check_urls(urls) do |nurse|
  nurse.clerk do |clerk|
    clerk.every_issue { |issue| puts "#{issue.severity}, #{issue.title}" }
  end
end

Simple issue reports

nurse = SiteHealth.check("https://example.com")
report = SiteHealth::IssuesReport.new(nurse.issue) do |r|
  r.fields = %i[url title detail] # issue fields
  r.select { |issue| issue.url.include?('blog/') }
end

report.to_a
report.to_csv
report.to_json

Event handlers

urls = ["https://example.com"]
nurse = SiteHealth.check_urls(urls) do |nurse|
  nurse.clerk do |clerk|
    clerk.every_journal do |journal, page|
      time_in_seconds = journal[:runtime_in_seconds]
      puts "Found page #{page.title} - #{page.url} (checks took #{time_in_seconds})"
    end

    clerk.every_check do |check|
      puts "Ran check: #{check.name}"
    end

    clerk.every_failed_url do |url|
      puts "Failed to fetch: #{url}"
    end
  end
end

Write page speed summary CSV

nurse = SiteHealth.check("https://example.com")
summary = SiteHealth::PageSpeedSummarizer.new(nurse.journal)
File.write("page_size_summary.csv", summary.to_csv)

Configuration

All configuration is optional.

SiteHealth.configure do |config|
  # Override default checkers
  config.checkers = [:json_syntax, :html]

  # Configure logger
  config.logger = Logger.new(STDOUT).tap do |logger|
    logger.progname = 'SiteHealth'
    logger.level = Logger::INFO
  end

  # Configure HTMLProofer
  config.html_proofer do |proofer_config|
    proofer_config.log_level = :info
    proofer_config.check_opengraph = false
  end

  # Configure W3C HTML/CSS validator
  config.w3c_validators do |w3c_config|
    w3c_config.css_uri = 'http://localhost:8888/check'
    w3c_config.html_uri = 'http://localhost:8888/check'
  end
end

Load non-default checkers:

A few of the non-default checkers available in this gem require 3rd-party dependencies which aren't installed by default.

Checker name Gem
google_page_speed google-api-client
html_proofer html-proofer
w3c_html w3c_validators
w3c_css w3c_validators

If you intend to use any of those checkers make sure to install the gem first. For example to use the google_page_speed checker add google-api-client to your Gemfile or install it manually with gem install google-api-client. Then you register the checker for use.

SiteHealth.config.register_checker :google_page_speed
# LoadError is raised if google-api-client is *not* installed

Add your own checker:

class ProfanityChecker < SiteHealth::Checker
  name "profanity"
  types %i[html json xml css javascript]

  def check
    add_data(profanity: {
      damn: page.body.include?(" damn "),
      shit: page.body.include?(" shit ")
    })
  end
end

# Then register it
SiteHealth.configure do |config|
  config.register_checker ProfanityChecker
end

CLI

Usage: site_health --help
        --url=val0
        --fields=priority,title,url  Issue fields to include - by default all fields are included
        --output=result.csv          Output path, .csv or .json
        --stats-output=stats.csv     Stats output path, .csv or .json
        --[no-]progress              Print progress while running to STDOUT
    -h, --help                       How to use

Development

After checking out the repo, run bin/setup to install dependencies. Then, run bundle exec rake to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/buren/site_health.

License

The gem is available as open source under the terms of the MIT License.


TODO

  • Good way to render result/reports data
  • Improve logger support
  • Checkers
    • canonical URL
    • http vs https links
    • links matching a pattern

About

Crawl a site and check various health indicators

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages