Universe HTML Purifier

Universe HTML Purifier provides a method to cleanup dirty html and it can help you in protection against cross site scripting (XSS) attacks. It will take a string of dirty and badly formatted html, and return a pretty printed valid HTML string. We can configure behavior of purifying, such as: changes available attributes, additional allowed tags, formatting and transforming. The script can work on both sides (client/server). This package is suggested to use with content editable, rich editor or everywhere where the html must be taken from untrusted sources.

The purifying is based on the HTML5 specification, and implements a subset of the algorithm described there. Only a limited set of the permitted HTML5 elements and attributes are permitted, and all other tags/attributes will simply be gone in the resulting HTML.

Additionally package provides html parser.

Installation

$ meteor add vazco:universe-html-purifier

Usage

Basic

UniHTML.purify('<p><b>Some</b><script> alert(); </script> Text</p>');

output:

<p>
    <b>Some</b> alert();  Text
</p>

Customize purifying

Additionally you can pass settings as a second parameter of method UniHTML.purify:

noFormatting - deactivation of pretty formatting
preferStrong_Em - transform tags:  to ,  to 
preferB_I - transform tags:  to ,  to 
noHeaders - transform heading tags to  (if preferB_I === true then will be  instead )
withoutTags - An array of skipped tags, (which were before added, including defaults). Warning: Parameter 'withoutTags' works only for global tags. You cannot skipped local table tags like example: <tr>, <caption>, <td>
noTextManhandle - skipping text processing like clearing whitespace and no formatting
encodeHtmlEntities - converting special characters (like < > & ") into their escaped/encoded values (like <)
catchErrors - ignoring errors (like not closed tags)

Example

UniHTML.purify('<p><b>Some</b> Text</p><img src="pic.jpg">', {withoutTags: ['b', 'img'], noFormatting: true});

output:

"<p>Some Text</p>"

Default allowed elements

b
i
strong
em
blockquote
ol
ul
li
h1-h7
p
span
pre
a
u
img
br
table
- caption
- col
- colgroup
- tbody
- td
- tfoot
- th
- thead
- tr

All not allowed elements will be stripped from the resulting HTML, although the inner text will be left intact. You can add additional tags using method UniHTML.addNewAllowedTag(tagName, isSelfClosing), is very important to pass true as second argument if tag is self-closing.

You can also skipped default allowed tags for current call purify, but only global tags (top level). It mean that you cannot skipped <td> or <tr> for <table>

As a default attribute href of tag 'a' has a restrictions for allowed urls

supported protocols: http, https, sftp, ftp, ftps, mailto
link to an element with a specified id within the page (like href="#top")
A relative URL from site root (like href="/default") but without chars: ":", "\", ";", "(", ")"

Default allowed attributes

all_elements: ['class', 'style', 'id']
a: ['href', 'target', 'title', 'name', 'rel', 'rev', 'type']
blockquote: ['cite']
img: ['src', 'alt', 'title', 'longdesc']
td: ['colspan']
th: ['colspan']
tr: ['rowspan']
table: ['border']

You can change allowed attributes for all or one allowed tag, by

UniHTML.setNewAllowedAttributes(attributesArray, tag);

default value of tag parameter is 'all_elements'

Parser

Package provides simple html parser. To use it, you can just call method:

UniHTML.parse(html_string, {
           // attributesOnTag is an Object like {name, value, escaped}
      start: function(tagName, attributesOnTag, isSelfClosing), // open tag
      end: function(tagName), // close
      chars: function(text), // text between open and closing tag
      comment: function(text) // text from comment
});

Parse html5 string (including custom tags) and calls callback in the same order as tags in html string are present. ( from root to leaf, and so on for each node)

License

Author: Krzysztof Różalski (Cristo Rabani) Released under Apache Software License 2.0

Includes John Resig’s and Erik Arvidsson’s HTML Parser, which is modificated to support html5 and It used as a tokenizer. Released under triple licensed using Apache Software License 2.0, Mozilla Public License or GNU Public License http://erik.eae.net/simplehtmlparser/simplehtmlparser.js

Written based on the wonderful:

and partly:

Additional users external materials

Example how to purify to whitelist

This package is part of Universe, a framework based on Meteor platform maintained by Vazco.

It works as standalone Meteor package, but you can get much more features when using the whole system.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.directory		.directory
.jshintrc		.jshintrc
HTMLParser.js		HTMLParser.js
HTMLPurifier.js		HTMLPurifier.js
LICENSE		LICENSE
README.md		README.md
package.js		package.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.directory

.directory

.jshintrc

.jshintrc

HTMLParser.js

HTMLParser.js

HTMLPurifier.js

HTMLPurifier.js

LICENSE

LICENSE

README.md

README.md

package.js

package.js

Repository files navigation

Universe HTML Purifier

Installation

Usage

Basic

Customize purifying

Example

Default allowed elements

Default allowed attributes

Parser

License

Additional users external materials

About

Releases 2

Packages

Contributors 4

Languages

License

cristo-rabani/meteor-universe-html-purifier

Folders and files

Latest commit

History

Repository files navigation

Universe HTML Purifier

Installation

Usage

Basic

Customize purifying

Example

Default allowed elements

Default allowed attributes

Parser

License

Additional users external materials

About

Resources

License

Stars

Watchers

Forks

Languages