Skip to content
This repository has been archived by the owner on Nov 5, 2018. It is now read-only.
/ prism Public archive

A Ruby microformat parser and HTML toolkit powered by Nokogiri

License

Notifications You must be signed in to change notification settings

mwunsch/prism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prism

Ruby microformat parser and HTML toolkit

RDoc | Gem | Metrics | Microformats.org

What Prism is:

  • A robust microformat parser
  • A command-line tool for parsing microformats from a url or a string of markup
  • A DSL for defining semantic markup patterns
  • Export microformats to other standards:
    • hCard => vCard

It is your lowercase semantic web friend.

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging).

Learn more about Microformats at http://microformats.org.

Usage

The command line tool takes a SOURCE from the Standard Input or as an argument:

$: curl http://markwunsch.com | prism --hcard > ~/Desktop/me.vcf

OR

$: prism --hcard http://markwunsch.com > ~/Desktop/me.vcf

Installation

With Ruby and Rubygems:

gem install prism

Or clone the repository and run bundle install to get the development dependencies.

Requirements:

Microformats supported (right now, as of this very moment)

More on the way.

Finding Microformats:

# All microformats
Prism.find 'http://foobar.com'

# A specific microformat
Prism.find 'http://foobar.com', :hcard

# Search HTML too
Prism.find big_string_of_html

Parsing Microformats:

twitter_contacts = Prism.find 'http://twitter.com/markwunsch', :hcard
me = twitter_contacts.first
me.fn
#=> "Mark Wunsch"
me.n.family_name
#=> "Wunsch"
me.url
#=> ["http://markwunsch.com/"]
File.open('mark.vcf','w') {|f| f.write me.to_vcard }
## Add me to your address book!	

POSH DSL

The Prism module defines a group of methods to search, validate, and extract nodes out of a Nokogiri document.

All microformats inherit from Prism::POSH, because all microformats begin as POSH formats. If you wanted to create your own POSH format, you'd do something like this:

class Navigation < Prism::POSH
	search {|document| document.css('ul#navigation') }
	# Search a Nokogiri document for nodes of a certain type
	
	validate {|node| node.matches?('ul#navigation') }
	# Validate that a node is the right element we want
	
	has_many :items do
		search {|doc| doc.css('li') }
	end
	# has_many and has_one define properties, which themselves inherit from
	# Prism::POSH::Base, so you can do :has_one, :has_many, :search, :extract, etc.
end

Now you can do:

nav = Navigation.parse_first(document) 
# document is a Nokogiri document. 
# parse_first extracts just the first example of the format out of the document

nav.items
# Returns an array of contents
# This method comes from the has_many call up above that defines the :items property

Other Microformat parsers

  • Mofo is a Ruby microformat parser backed by Hpricot.
  • Sumo is a JavaScript microformat parser.
  • Operator is a Firefox extension.
  • hKit is a microformat parser for PHP.
  • Oomph is a microformat toolkit add-in for Internet Explorer.

Feature wishlist:

  • HTML outliner (using HTML5 sectioning)
  • HTML5 article, time, etc POSH support
  • Extensions so you can do something like: String.is_a_valid? :hcard in your tests
  • Extensions to turn Ruby objects into semantic HTML. Hash.to_definition_list, Array.to_ordered_list, etc.

TODO:

  • Code is ugly. Especially XOXO.
  • Better recursive parsing of trees. See above.
  • Tests are all kinds of disorganized. And slow.
  • Broader support for some of the weirder Patterns, like object[data]
  • Man pages (see Ron)

License

Prism is licensed under the MIT License and is Copyright (c) 2010 Mark Wunsch.

About

A Ruby microformat parser and HTML toolkit powered by Nokogiri

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages