Skip to content

Commit

Permalink
initial
Browse files Browse the repository at this point in the history
  • Loading branch information
icambron committed Sep 2, 2015
0 parents commit 8c732df
Show file tree
Hide file tree
Showing 13 changed files with 425 additions and 0 deletions.
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
/target
/classes
/checkouts
pom.xml
pom.xml.asc
*.jar
*.class
/.lein-*
/.nrepl-port
.hgignore
.hg/
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2015 Zensight

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# header-utils

[![Clojars][clojars-img]][clojars-url]
[![Build Status][travis-image]][travis-url]
[![MIT License][license-image]][license]
![Phasers to stun][phasers-image]

A Clojure library for handling gross things in HTTP headers. Specifically, it can encode and parse Content-Disposition headers with special characters, and exposes clean APIs for working RFC-5987.

## Usage

### Content-Disposition

```clj
user=> (use 'header-utils.content-disposition)
nil
user=> (def s (encode "attachment" "Y͢o҉u f̴ee̡l̡ ̶fée͝bl̢e.͡.pdf"))
#'user/s
user=> s
"attachment;filename*=UTF-8''Y%CD%A2o%D2%89u%20f%CC%B4ee%CC%A1l%CC%A1%20%CC%B6f%C3%A9e%CD%9Dbl%CC%A2e.%CD%A1.pdf"
user=> (parse-type s)
"attachment"
user=> (parse-filename s)
"Y͢o҉u f̴ee̡l̡ ̶fée͝bl̢e.͡.pdf"
```

You can also specify language and additional parameters.

### Other tools

My goal in writing this library was the handle Content-Disposition, but I took some pains to make the proximate tools as useful as possible. Specifically:

* `header-utils.parameters` - encode and parse RFC-5987 parameters
* `header-utils.encoding` - common tools for reading/writing header values
* `header-utils.parser` - internal tool useful in extending the library (e.g. adding direct support for additional headers)

## Todo

This library contains all the utilities required for adding explicit support for other headers, and actually doing so should be relatively easy. I'll add that support if/when I need them or you send me a PR.

## License

Copyright © 2015 Zensight

Distributed under the MIT License. See [LICENSE][] for more info.

[documentation-url]: http://icambron.github.io/twix.js/docs.html

[license-image]: http://img.shields.io/badge/license-MIT-blue.svg?style=flat-square
[license]: LICENSE.md

[clojars-url]: https://clojars.org/co.zensight/header-utils
[clojars-img]: https://img.shields.io/clojars/v/co.zensight/header-utils.svg?style=flat-square

[travis-url]: http://travis-ci.org/zensight/header-utils
[travis-image]: http://img.shields.io/travis/zensight/header-utils/develop.svg?style=flat-square

[phasers-image]: https://img.shields.io/badge/phasers-stun-green.svg?style=flat-square
9 changes: 9 additions & 0 deletions project.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
(defproject co.zensight/header-utils "0.1.0-SNAPSHOT"
:description "Tools for working HTTP headers"
:url "http://github.com/zensight/header-utils"
:license {:name "The MIT License (MIT)"
:url "http://opensource.org/licenses/mit-license.html"}
:scm {:name "git"
:url "https://github.com/Zensight/file-buffer"}
:dependencies [[org.clojure/clojure "1.6.0"]
[instaparse "1.4.1"]])
31 changes: 31 additions & 0 deletions resources/grammar.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
;rfc-6266, see content-disposition.clj
content-disposition = <LWSP> content-disposition-type *( <LWSP> <";"> <LWSP> content-disposition-param)
content-disposition-type = token
content-disposition-param = parameter

;rfc-5987, see parameters.clj
parameter = reg-parameter / ext-parameter
reg-parameter = parmname LWSP "=" LWSP reg-value
ext-parameter = parmname "*" LWSP "=" LWSP ext-value
parmname = 1*attr-char

ext-value = charset <"'"> [ language ] <"'"> ext-value-chars
charset = "UTF-8" / "ISO-8859-1" / mime-charset
mime-charset = 1*mime-charsetc
mime-charsetc = ALPHA / DIGIT / #'[!#$%&+-^_`{}~]'
language = *( ALPHA / DIGIT / "-" )

ext-value-chars = *( pct-encoded / attr-char )

pct-encoded = "%" HEXDIG HEXDIG
attr-char = #'[^()<>@,;:\\"/\[\]?={} \t*\'%]'

;rfc-2616, see parser.clj
reg-value = token / quoted-string

token = 1*token-char
token-char = #'[^()<>@,;:\\"/\[\]?={} \t]'

quoted-string = ( <DQUOTE> *( qdtext / quoted-pair ) <DQUOTE> )
qdtext = #'[^"]'
quoted-pair = "\" DQUOTE
58 changes: 58 additions & 0 deletions src/header_utils/content_disposition.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
(ns header-utils.content-disposition
"Tools for encoding and parsing Content-Disposition headers according to RFC 6266."
(:require [clojure.string :as str]
[header-utils.parameters :as parm]
[header-utils.parser :as p]))

(def header-name "Content-Disposition")

(def ^:private xform-6266
{:content-disposition (fn [& children]
(reduce (fn [result [name & children]]
(condp = name
:content-disposition-type (assoc result :type (str/lower-case (first children)))
:content-disposition-param (update-in result [:parameters] merge (first children))
result))
{} children))})

(defn- parse [value]
(if (empty? value)
nil
(->>
(p/parse value :content-disposition (merge xform-6266 parm/xform-5987))
(merge {:parameters []}))))

(defn- disposition-type [parsed]
(when parsed
(:type parsed)))

(defn- parameter [parsed name]
(when parsed
(parm/find-parameter (:parameters parsed) name)))

(defn- filename [parsed]
(parameter parsed "filename"))

(def parse-type
"Retrieve the disposition type from the Content-Disposition value. Typically \"inline\" or \"attachment\"."
(comp disposition-type parse))

(def parse-filename
"Retrieve the (decoded) filename from the Content-Disposition value. Prefers extended values to regular ones if both are present. May be nil."
(comp filename parse))

(defn parse-parameter
"Retrieve an arbitrary parameter from the Content-Disposition value."
[value param-name]
(-> value parse (parameter param-name)))

(defn encode
"Write the value of the Content-Disposition header (i.e. just the right-hand side) for a type, filename, an option language (e.g. \"en\"), and an optional map of other parameters. Filename may be nil."
([type filename] (encode type filename nil {}))
([type filename language more]
(let [parameters (if filename (merge more {:filename filename}) more)]
(->>
(conj (map
(fn [[k v]] (parm/encode (name k) v language)) parameters)
type)
(str/join ";")))))
52 changes: 52 additions & 0 deletions src/header_utils/encoding.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
(ns header-utils.encoding
"Tools for encoding and decoding strings in http headers."
(:require [clojure.string :as s]
[clojure.set :as se])
(:import [java.net URLDecoder URLEncoder]))

;;Some of this is redundant with the grammar in the parser. That's a bummer, but it's hard to fix without inlining
;;the grammar itself, which is a pain because Clojure lacks heredocs.
(def separator-chars #{\( \) \< \> \@ \, \; \: \\ \" \/ \[ \] \? \= \{ \} \space \tab})
(def non-attr-chars (se/union #{\* \' \%} separator-chars))

(defn- normalize-charset [charset]
(s/upper-case charset))

(defn- ascii? [c]
(< 31 (int c) 127))

(defn- attr-char? [c]
(and (ascii? c)
(not (non-attr-chars c))))

(defn quote-str
"Quote if needed, otherwise leave as-is."
[value]
(if (some separator-chars value)
(as->
value
$
(s/replace $ #"\\" "\\\\\\\\") ;;yes, 8 fucking backslashes
(s/replace $ #"\"" "\\\\\"")
(str "\"" $ "\""))
value))

(defn percent-decode
"Decode %HEX HEX to the appropriate encoding."
[value encoding]
(URLDecoder/decode value (normalize-charset encoding)))

(defn percent-encode
"Encode with %HEX HEX for values outside of allowed attribute values."
[value encoding]
(as->
(for [c value]
(if (and (attr-char? c) (not= \+ c)) ;;we're cheating here so that we can use URLEncoder, which replaces spaces with +
c
(URLEncoder/encode (str c) (normalize-charset encoding))))
$
(apply str $)
(s/replace $ #"\+" "%20")))

(defn all-ascii? [value]
(every? ascii? value))
47 changes: 47 additions & 0 deletions src/header_utils/parameters.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
(ns header-utils.parameters
"Tools for encoding and parsing http header parameters according to RFC 5987."
(:require [clojure.string :as str]
[header-utils.encoding :as e]
[header-utils.parser :as p]))

(def xform-5987
{:parameter identity
:reg-parameter (fn [name & others] {name (p/value-in-tag :reg-value others)})
:ext-parameter (fn [name & others] {(str name "*") (p/value-in-tag :ext-value others)})
:ext-value (fn [& items]
(let [charset (p/value-in-tag :charset items)
value-chars (p/value-in-tag :ext-value-chars items)]
[:ext-value (e/percent-decode value-chars (str/upper-case charset))]))
:ext-value-chars (fn [& s] [:ext-value-chars (apply str s)])
:parmname str
:mime-charsetc str
:mime-charset str
:attr-char str
:pct-encoded str
:HEXDIG str
:ALPHA str
:DIGIT str})

(defn encode
"Encodes the name, value, and optionally language as an RFC-5897 header, handling all encoding for you. Returns a string that looks like
name=value or name=quoted-value or name*=encoded-value, depending on the contents of the string and the options provided."
([name value] (encode name value nil))
([name value language]
(let [simple-name (.replace name "*" "")]
;;I'm *super* unclear on whether ISO-8859-1 chars outside of US-ASCII are allowed in tokens or quoted strings.
;;2616 implies they aren't but 5987 implies they are. We're going to assume they're not.
(if (or language (not (e/all-ascii? value)))
(str simple-name "*=" (str "UTF-8'" language "'" (e/percent-encode value "UTF-8")))
(str simple-name "=" (e/quote-str value))))))

(defn parse
"Parse a parameter, e.g. `(parse \"name=value\")`. Handles encodings transparently. Currently discards language."
[string]
(p/parse string :parameter xform-5987))

(defn find-parameter
"Find a value by `param-name` in a `param-map` (such as the one produced by calling `parse` and merging the results). Prefers extended versions of the parameters. Useful for eliding the difference between [param]* and [param]."
[param-map param-name]
(if-let [starred (get param-map (str param-name "*"))]
starred
(get param-map param-name)))
37 changes: 37 additions & 0 deletions src/header_utils/parser.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
(ns header-utils.parser
"Parsing utilities for header-utils. Probably only useful in extending the library."
(:require [instaparse.core :as insta]))

(def ^:private parser (insta/parser (clojure.java.io/resource "grammar.txt") :input-format :abnf))

;;utilities for xforming the parsing results

(defn tagged
"Returns a function that takes an instaparse node and returns true if the node has name `tag`."
[tag]
#(#{tag} (first %)))

(defn value-in-tag
"Shortcut for finding the first value in the first node with a given tag."
[tag list] (-> tag tagged (filter list) first second))

(def ^:private xform-2616
{:token str
:token-char str
:quoted-pair (constantly "\"")
:qdtext str
:quoted-string str})

(defn parse*
"Given a string and a starting rule from the grammar, parse the string and return the raw instaparse tree. Useful for debugging."
[value start]
(binding [instaparse.abnf/*case-insensitive* true]
(parser value :start start)))

(defn parse
"Given a string, a starting rule from the grammar, and a transformation map, return a parsed structure. Useful for extending this library."
[value start xform]
(->>
(parse* value start)
(insta/transform (merge xform xform-2616))))

41 changes: 41 additions & 0 deletions test/header_utils/content_disposition_test.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
(ns header-utils.content-disposition-test
(:require [header-utils.content-disposition :as cd]
[clojure.test :refer :all]))

(deftest filename-test
(are [x y] (= (cd/parse-filename x) y)
"Attachment; filename=example.html" "example.html"
"attachment;\r\n filename*= UTF-8''%e2%82%ac%20rates" "€ rates"
"attachment; filename=\"EURO rates\"; filename*=utf-8''%e2%82%ac%20rates" "€ rates"
"attachment;filename=\"this has spaces.pdf\"" "this has spaces.pdf"
"inline" nil
"asdfas:e҉rasdfasfe;a*^*&F" nil
"attachment:filename=improperly spaced" nil
"" nil
nil nil))

(deftest disposition-type-test
(are [x y] (= (cd/parse-type x) y)
"Attachment; filename=example.html" "attachment"
"inline" "inline"
"INLINE" "inline"
"" nil
nil nil))

(deftest parameter-test
(are [x y] (= (cd/parse-parameter x "random") y)
"Attachment;random=cheese" "cheese"
"inline;filename=dude; random=goats" "goats"
"inline;random*=UTF-8''special%D2%89" "special҉"))

(deftest encode-test
(are [x y] (= (apply cd/encode x) y)
["inline" nil] "inline"
["inline" "foo.txt"] "inline;filename=foo.txt"
["inline" "foo bar.txt"] "inline;filename=\"foo bar.txt\""
["inline" "foo \" bar.txt"] "inline;filename=\"foo \\\" bar.txt\""
["inline" "special҉"] "inline;filename*=UTF-8''special%D2%89"
["inline" "need-language" "en" {}] "inline;filename*=UTF-8'en'need-language"
["inline" "more-params" nil {:foo "bar"}] "inline;filename=more-params;foo=bar"
["inline" "more-params" nil {:foo "special҉"}] "inline;filename=more-params;foo*=UTF-8''special%D2%89"))

Loading

0 comments on commit 8c732df

Please sign in to comment.