Skip to content

Ruby Gem to parse sitemaps.org compliant sitemaps

License

Notifications You must be signed in to change notification settings

etaminstudio/sitemap-parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sitemap Parser

Ruby Gem to parse sitemaps.org compliant sitemaps

Build Status Gem Version

Usage

Create a new instance of the Parser:

sitemap = SitemapParser.new "http://ben.balter.com/sitemap.xml"

Extract the URLs of the sitemap

sitemap.urls # => Array of Nokigiri XML::Node objects
sitemap.to_a # => Array of url strings

Options

Recurse nested sitemaps

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true})

Or if you only want to extract only sitemap urls maching a given pattern, you can provide a regex that will be used to match each page.

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', {recurse: true, url_regex: /sitemapregex/})

Typhoeus Options

sitemap = SitemapParser.new('http://ben.balter.com/sitemap.xml', { userpwd: "username:password" })

Roadmap

  • sitemap_index support

About

Ruby Gem to parse sitemaps.org compliant sitemaps

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 88.2%
  • Shell 11.8%