Skip to content

Latest commit

Β 

History

History
503 lines (417 loc) Β· 19.7 KB

README.md

File metadata and controls

503 lines (417 loc) Β· 19.7 KB

Worker Bee: 🐝 Cloudflare Worker Composer ☁️

minified and zipped size Tests

Toolkit for composing Cloudflare Workers, focused on the use case of having an upstream server, and wanting to conditionally manipulate requests and responses.

Example Uses

  • All requests to /landing-page/ should strip that subdirectory and proxy from Netlify instead of your normal server.
  • Requests from the googleweblight user agent should have Cache-Control: no-transform set on the response.
  • Cookies should be stripped for requests to the /shop/ section of your site.
  • UTM parameters and Facebook click IDs should be removed from requests to your server to increase cacheability.
  • WordPress users should not be logged in on the front of the site unless they’re previewing a post.
  • Make your entire site HTTPS except for one section.
  • Make all images use browser-native lazy loading.

If you'd like, jump straight to the examples.

Table of Contents

Concepts

Cloudflare Worker Utilities is based around three main concepts:

  • Handlers β€” Functions that are run when a request is being received, and/or a response from the server/cache is coming back. They can change the request/response, deliver a new request/response altogether, or conditionally add other handlers.
  • Routes β€” Host/route request path patterns with handlers thare are only added only for requests that match the pattern.
  • Conditions β€” Functions that determine whether a handler should be applied.

Usage

  1. Bootstrap your Cloudflare Worker, using Wrangler. Make sure you’re using Webpack.
  2. npm i workerbee from your Worker directory.
  3. In your Worker, import handleFetch and provide an array of request/response handlers, and/or route-limited request/response handlers.

Example:

import handleFetch from 'workerbee'

handleFetch({
	request: requestHandlers, // Run on every request.
	response: responseHandler, // Run on every response.
	routes: (router) => {
		router.get('/test', {
			request: requestHandlers, // Run on matching requests.
			response: responseHanders, // Run on responses from matching requests.
		})

		router.get('/posts/:id', {
			request: requestHandlers, // Run on matching requests.
			response: responseHandlers, // Run on responses from matching requests.
		})
	},
})

Top level request and response handlers will be run on every route, before any route-specific handlers.

For all places where you specify handlers, you can provide one handler, an array of handlers, or no handlers (null, or empty array). Routes can also accept variadic handlers, which will be assumed to be request handlers.

Lifecycle

It goes like this:

  1. Request is received.
  2. The Request loops through all request handlers (global, and then route).
  3. If early Response wasn’t received, the resulting Request object is fetched (from the cache or the server).
  4. The resulting Response object is passed through the response handlers (global, and then route).
  5. The response is returned to the client.
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Incoming Request β”‚
  β”‚  to your Worker  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
    .───────────────.
   (  Matches route? )───Yes─┐
    `───────────────'        β”‚
            β”‚                β–Ό
            β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           No    β”‚ Append route handlers β”‚
            β”‚    β”‚  to global handlers   β”‚
            β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚    Run next     β”‚
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚ request handler β”‚
 β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚                  β”‚
 β”‚                  β–Ό
 β”‚   .─────────────────────────────.
 β”‚  ( Handler returned a Response?  )───┐
 β”‚   `─────────────────────────────'    β”‚
 β”‚                  β”‚                  Yes
 β”‚                 No                   β”‚
Yes                 β”‚                   β”‚
 β”‚                  β–Ό                   β”‚
 β”‚          .───────────────.           β”‚
 └─────────(  More handlers? )          β”‚
            `───────────────'           β”‚
                    β”‚                   β”‚
                   No                   β”‚
                    β”‚                   β”‚
                    β–Ό                   β”‚
         .─────────────────────.        β”‚
  β”Œβ”€β”€β”€β”€β”€(  Request in CF cache? )────┐  β”‚
  β”‚      `─────────────────────'     β”‚  β”‚
 Yes                                No  β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
  β”‚  β”‚ Fetch from β”‚  β”‚ Fetch from β”‚  β”‚  β”‚
  └─▢│   cache    β”‚  β”‚   server   β”‚β—€β”€β”˜  β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
            β”‚               β”‚           β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
                    β”‚                   β”‚
                    β–Ό                   β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
              β”‚ Response β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
                    β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚     Run next     β”‚
      β”Œβ”€β”€β–Άβ”‚ response handler β”‚
      β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚             β”‚
     Yes            β–Ό
      β”‚     .───────────────.
      └────(  More handlers? )
            `───────────────'
                    β”‚
                   No
                    β”‚
                    β–Ό
           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
           β”‚ Final Response β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Routing

The router has functions for all HTTP methods, plus router.all() which matches any method. e.g. router.get(path, handlers), router.post(path, handlers).

The path argument uses the path-to-regexp library, which has good support for positional path parameters. Here’s what various routes would yield for a given request:

Pattern πŸ†— URL Params
/posts/:id
βœ… /posts/123 {id: "123"}
βœ… /posts/hello {id: "hello"}
❌ /posts
/posts/:id?
βœ… /posts/123 {id: "123"}
βœ… /posts/hello {id: "hello"}
βœ… /posts {}
❌ /posts/hello/another
/posts/:id(\\d+)/:action
βœ… /posts/123/edit {id: "123", action: "edit"}
❌ /posts/hello/edit
/posts/:id+
βœ… /posts/123 {id: ["123"]}
βœ… /posts/123/hello/there {id: ["123", "hello", "there"]}
/posts/:id*
βœ… /posts {}
βœ… /posts/123 {id: ["123"]}
βœ… /posts/123/hello {id: ["123", "hello"]}
/bread/:meat+/bread
βœ… /bread/turkey/bread {meat: ["turkey"]}
βœ… /bread/peanut-butter/jelly/bread {meat: ["peanut-butter", "jelly"]}
❌ /bread/bread
/mother{-:type}?
βœ… /mother {}
βœ… /mother-in-law {type: "in-law"}
❌ /mothers

If you want to match a path prefix and everything after it, just use a wildcard matcher like /prefix/:any* (and then just ignore what gets matched by :any*).

Note that a trailing slash will match, so /posts/ will match /posts.

Go read the path-to-regex documentation for more information.

You can also limit your routes to a specific host, like so:

import handleFetch, { forbidden, setRequestHeaders } from 'workerbee'

handleFetch({
	routes: (router) => {
		router.host('example.com', (router) => {
			router.get('/', setRequestHeaders({ 'x-foo': 'bar' }))
		})
		router.host('*.blogs.example.com', (router) => {
			router.all('/xmlrpc.php', forbidden())
		})
	},
})

This makes it trivial to set up a Worker that services multiple subdomains and routes, instead of having to maintain a bunch of independent Workers.

Handlers

Handlers are functions (preferably async functions). They are passed an object that contains:

{
  addRequestHandler(),
  addResponseHandler(),
  addCfPropertiesHandler(),
	setRedirectMode(),
  originalRequest,
  request,
  response,
  current,
  params,
  phase,
}
  • addRequestHandler(handler, options) β€” dynamically adds another request handler (pass {immediate: true} to add it as the first or next handler).
  • addResponseHandler(handler, options) β€” dynamically adds another response handler (pass {immediate: true} to add it as the first or next handler).
  • addCfPropertiesHandler(handler) β€” adds a callback that receives and returns new properties to pass to fetch() on the cf key (see Cloudflare documentation).
  • setRedirectMode(mode) β€” sets the redirect mode for the main fetch. Default is 'manual', but you can set 'follow' or 'error'.
  • request β€” A Request object representing the current state of the request.
  • originalRequest β€” The original Request object (might be different if other handlers returned a new request).
  • response β€” A Response object with the current state of the response. β€” current β€” During the request phase, this will equal request. During the response phrase, this will equal response. This is mostly used for conditions. For instance, the header condition works on either requests or responses, as both have headers. Thus it looks at { current: { headers } }.
  • params β€” An object containing any param matches from the route.
  • phase β€” One of "request" or "response".

Request handlers can return three things:

  1. Nothing β€” the current request will be passed on to the rest of the request handlers.
  2. A new Request object β€” this will get passed on to the rest of the request handlers.
  3. A Response object β€” this will skip the rest of the request handlers and get passed through the response handlers.

Response handlers can return two things:

  1. Nothing β€” the current response will be passed on to the rest of the repsonse handlers.
  2. A new Response object β€” this will get passed on to the rest of the request handlers.

Bundled Handlers

The following handlers are included:

  • setUrl(url: string)
  • setHost(host: string)
  • setPath(path: string)
  • setProtocol(protocol: string)
  • setHttps()
  • setHttp()
  • forbidden()
  • setRequestHeaders([header: string, value: string][] | {[header: string]: string})
  • appendRequestHeaders([header: string, value: string][] | {[header: string]: string})
  • removeRequestHeaders(headers: string[])
  • setResponseHeaders([header: string, value: string][] | {[header: string]: string})
  • appendResponseHeaders([header: string, value: string][] | {[header: string]: string})
  • removeResponseHeaders(headers: string[])
  • copyResponseHeader(from: string, to: string)
  • lazyLoadImages()
  • prependPath(pathPrefix: string)
  • removePathPrefix(pathPrefix: string)
  • redirect(status: number)
  • redirectHttps()
  • redirectHttp()
  • requireCookieOrParam(param: string, forbiddenMessage: string)

Logic

Instead of bundling logic into custom handlers, you can also use addHandlerIf(condition, ...handlers) together with the any(), all() and none() gates to specify the logic outside of the handler. Here’s an example:

import {
	handleFetch,
	addHandlerIf,
	contains,
	header,
	forbidden,
} from 'workerbee'

handleFetch({
	request: [
		addHandlerIf(
			any(
				header('user-agent', contains('Googlebot')),
				header('user-agent', contains('Yahoo! Slurp')),
			),
			forbidden(),
			someCustomHandler(),
		),
	],
})

addHandlerIf() takes a single condition as its first argument, but you can nest any(), all() and none() as much as you like to compose a more complex condition.

Conditions

As hinted above, there are several built-in conditions for you to use:

  • header(headerName: string, matcher: ValueMatcher)
  • contentType(matcher: ValueMatcher)
  • isHtml()
  • hasParam(paramName: string)
  • hasRouteParam(paramName: string)
  • param(paramName: string, matcher: ValueMatcher)
  • routeParam(paramName: string, matcher: ValueMatcher)
  • isHttps()
  • isHttps()

The ones that take a string (or nothing) are straightforward, but what’s up with ValueMatcher?

A ValueMatcher is flexible. It can be:

  • string β€” will match if the string === the value.
  • string[] β€” will match if any of the strings === the value.
  • ValueMatchingFunction β€” a function that takes the value and returns a boolean that decides the match.
  • ValueMatchingFunction[] β€” an array of functions that take the value, any of which can return true and decide the match.

The following ValueMatchingFunctions are available:

  • contains(value: string | NegatedString | CaseInsensitiveString | NegatedCaseInsensitiveString)
  • startsWith(value: string | NegatedString | CaseInsensitiveString | NegatedCaseInsensitiveString)
  • endsWith(value: string | NegatedString | CaseInsensitiveString | NegatedCaseInsensitiveString)

These functions can also accept insensitive strings and negated strings with the text('yourtext').i and text('yourtext).not helpers.

addHandlerIf(
	header('User-Agent', startsWith(text('WordPress').not.i)),
	forbidden(),
)

Note that you can use logic functions to compose value matchers! So the example from the Logic section could be rewritten like this:

import {
	handleFetch,
	addHandlerIf,
	contains,
	header,
	forbidden,
} from 'workerbee'

handleFetch({
	request: [
		addHandlerIf(
			header(
				'user-agent',
				any(contains('Googlebot'), contains('Yahoo! Slurp')),
			),
			forbidden(),
			someCustomHandler(),
		),
	],
})

Two more points:

  1. The built-in conditionals support partial application. So you can do this:
const userAgent = header('user-agent')

Now, userAgent is a function that accepts a ValueMatcher.

You could take this further and do:

const isGoogle = userAgent(startsWith('Googlebot'))

Now you could just add a handler like:

handleFetch({
	request: [addHandlerIf(isGoogle, forbiddden)],
})
  1. The built-in conditionals automatically apply to current. So if you run them as a request handler, header inspection will look at the request. As a response handler, it’ll look at response. But you can also use the raw conditionals while creating your own handlers. For instance, in a response handler you might want to look at the request that went to the server, or the originalRequest that came to Cloudflare.
import forbidden from 'workerbee'
import { hasParam } from 'workerbee/conditions'

export default async function forbiddenIfFooParam({ request }) {
	if (hasParam('foo', request)) {
		return forbidden()
	}
}

In most cases you will not be reaching into the request from the response. A better way to handle this is to have a request handler that conditionally adds a response handler. But if you want to, you can, and you can use those "raw" conditions to help. Note that the raw conditions will not be curried, and you'll have to pass a request or response to them as their last argument.

Best Practices

  1. Always return a new Request or Response object if you want to change things.
  2. Don’t return anything if your handler is declining to act.
  3. If you have a response handler that is only needed based on what a request handler does, conditionally add that response handler on the fly in the request handler.
  4. Use partial application of built-in conditionals to make your code easier to read.

License

MIT License

Copyright Β© 2020–2021 Mark Jaquith


This software incorporates work covered by the following copyright and permission notices:

tsndr/cloudflare-worker-router
Copyright Β© 2021 Tobias Schneider
(MIT License)

pillarjs/path-to-regexp
Copyright Β© 2014 Blake Embrey
(MIT LICENSE)