Skip to content

Architecture: extras

Edsko de Vries edited this page Aug 22, 2013 · 3 revisions

May be out of date

The following are useful extra things provided by hackage-server.

Resources

See Architecture: Resource: serving a URI with resources.

Resource URI generation

Resources can be used for more than just routing: they're a sort of recipe for generating URIs. There are two primary way of generating String URIs. The first is by taking a list of strings and filling in a Resource's URI in the order of its dynamic components.

The other is by providing a DynamicPath, which may be less terse.

Both of these stop short if the list ends early or the DynamicPath is missing a required String, respectively.

Blob storage

This is how files are stored. The default directory for this is the /state/blob directory. The Config type contains a pointer to the server-wide blob store, which by default is at state/blobs/. Access functions are provided in Distribution.Server.Util.BlobStorage, best imported qualified. Blobs are stored using this function:

add :: BlobStorage -> ByteString -> IO BlobId

Given a BlobStorage (from Config.serverStore) and the contents of a file, return an id for the blob. BlobId is just a newtype for MD5Digest, so a file might be stored in e.g. /state/blobs/9f10bccb6fd4b761f6d1a848cd50308f, where the location is the MD5 of the entire file. This makes blob storage idempotent: putting a file there once has the same effect as putting it there again.

There is also support for conditionally adding a file to the blob storage. It does this by placing it in /state/blobs/incoming and moving it only if the check function says its ByteString is okay. This type here is

addWith :: BlobStorage -> ByteString
        -> (ByteString -> IO (Either error result))
        -> IO (Either error (result, BlobId))

There are two functions to access an already-stored blob using its BlobId: fetch :: BlobStorage -> BlobId -> IO ByteString and filepath :: BlobStorage -> BlobId -> FilePath, returning the contents and filename respectively.

Data combinators

This isn't so much a feature as a common-sense practice: write combinator functions. This is a great way to parse DynamicPaths into typed data. For example, if you have a BlogPostId (newtype of Int) that you want to parse from a page "/blog/post/:id", write a function:

withPostId :: DynamicPath -> (BlogPostId -> ServerPart a) -> ServerPart a
withPostId dpath func = case simpleParse =<< lookup "id" dpath of
    Nothing -> mzero
    Just blog -> func blog

Or use combinators for data lookup:

withBlogPost :: BlogPostId -> (String -> ServerPart a) -> ServerPart a
withBlogPost pid func = do
    mcontents <- query $ LookupPost pid
    case mcontents of
        Nothing -> notFound . toResponse $ "Post #" ++ show pid ++ " not found"
        Just contents -> func contents

And, while you're writing convenience functions, combine path parsing and lookup:

withBlogPostPath :: DynamicPath -> (BlogPostId -> String -> ServerPart a) -> ServerPart a
withBlogPostPath dpath func = withPostId dpath $ \pid -> withBlogPost pid $ \txt -> func pid txt

Note that mzero is used in the case of failing to parse a URI, so that perhaps another resource can get a chance. However, in the case the data is not found after an in-memory lookup, explicit 404s are used. If an object can't be found in a collection of objects, it can be helpful to indicate this and provide a link to the collection itself. This can be done with MServerPart, described in the next section.

The possibility of failure

In the case of an error, Happstack provides a simple way to report it: return a Response through an appropriate HTTP response code filter. This gets a bit messy if you're using more than one format: either you case all of the time on Eithers or Maybes, or you rewrite the same error message every time. Hackage provides format-generic failing in Distribution.Server.Error, intended to be used with combinators that fail but want to provide a specific message.

data ErrorResponse = ErrorResponse {
    errorCode  :: Int,
    errorTitle :: String,
    errorMessage :: [Message]
}

-- A message with hypertext. (MLink str href) will be taken
-- as a pointer to a relevant page, and (MText str) merely
-- as text.
data Message = MLink String String | MText String

type MServerPart a = ServerPart (Either ErrorResponse a)

returnOk :: Monad m => a -> m (Either ErrorResponse a)
returnError :: Int -> String -> [Message] -> MServerPart a
responseWith :: MServerPart a -> (a -> MServerPart b) -> MServerPart b

There are two parts to using it. The first is producing an MServerPart Response using combinators, possibly chained. The second part is going from MServerPart Response -> ServerPart Response using a specific function that may, in the case of failure, render an error in the desired format (see e.g. htmlResponse in the HTML feature). MServerPart isn't actually a monad: ServerPart is the monad, and Either ErrorResponse a is the type of its result. Still, there are two unit functions provided for it (returnOk and returnError) and a bind function. The primary benfit of this setup is that MServerPart can be used in any function where a ServerPart is expected.

Rewriting the withBlogPost function to use MServerPart, you might get:

withBlogPost :: BlogPostId -> (String -> MServerPart a) -> MServerPart a
withBlogPost pid func = do
    mcontents <- query $ LookupPost pid
    case mcontents of
        Nothing -> returnError 404 "Post not found"
          [ MText "Post #"
          , MLink (show pid) ("/blog/post/" ++ show pid)
          , MText " not found."]
        Just contents -> func contents

Authentication, below, takes a similar approach, not only returning an error message but also setting the WWW-Authenticate header.

Caches

These are just non-persistent values in memory, updateable asynchronously and atomically. Beware, they're not updated until the new value is fully evaluated. There should probably be more fine-grained control over their operation.

(Side note: there is currently no server-side or client-side cache middleware, which need a more systematic approach than this. Last-modified would be simple if each feature just stored more timestamps, but ETags are quite complicated where multiple content-types and PUT are involved.)

Hooks and filters

See also: Architecture: Hooks

Hooks are generally called after an update happens, and they can take any number of arguments and run an IO action. They may call other Hooks in turn, but they shouldn't take too long. They are processed in sequence, and run in the reverse order of their adding.

Filters are generally called before an update happens, with the ability to stop the event, or inject some value into it. They can also take any number of arguments, and return a typed IO result. They use the same internal representation as Hooks, but have more specific utility functions.

User lists

User lists are just collections of UserIds, and the standard type for them is UserList, with operations on it in Distribution.Server.Users.Group. You might store a user list as part of a feature's persistent data, or maybe a collection of them (e.g. package maintainers are a Map PackageName UserList). These are useful happstack-state functions to define for user groups:

  • Get (query): return the UserList. If it's a collection of user lists, have one function return them all, and another to query for a specific collection item, returning an empty list if none exists.
  • Exists (query): if it's a collection of user lists, ask whether or not a list exists for a given collection item.
  • Add (update): add a single UserId for a UserList
  • Remove (update): remove a single UserId from the UserList
  • Replace (update): change the entire user list structure to the argument. Useful for backup.

User groups

This is the standardized interface for editing and querying UserLists in the public Hackage interface. It's subject to change, because the flexibilty of it is questionable (it's a bit of an impure setup). The type of a UserGroup is in Distribution.Server.Users.Group, and to use it internally, you should pass it through either groupResourceAt or groupResourcesAt in Distribution.Server.Features.Users. This means that any operations on the group will also update the user-to-group index.

Authentication

Hackage implements basic and digest authentication. This authenticates an access control list against the user database using stateless HTTP. The primary function for this, using MServerPart, is:

withHackageAuth :: Users -> Maybe UserList -> Maybe AuthType ->
                   (UserId -> UserInfo -> MServerPart a) -> MServerPart a

The Users argument is the user database to authenticate against, which can be retrieved with GetUserDb.

The Maybe UserList argument, if Just ulist, can return a forbidden message if the authenticated user is not in the given user list. If Nothing, it'll just require that the user is logged in, not require them to be in a user group. (Note: disabled users can't authenticate in either case.)

The Maybe AuthType argument can force an authentication type, either BasicAuth or DigestAuth. If Nothing, the server default will be used, currently basic authentication. If either of the two, either basic or digest will be used. If digest authentication is used and the user's data is stored in BasicAuth format, this will produce a special error message.

The final argument, the function, will be called in the case of successful authentication. Otherwise, the resulting MServerPart will contain an authentication error with a corresponding WWW-Authenticate header.

Backup

See Architecture: Backup for implementing import and export for features.