-
Notifications
You must be signed in to change notification settings - Fork 198
Architecture: backup
hackage-server has the ability to perform periodic backups (snapshots) of the server state at a given time. Data is exported and imported per feature, so a file "spammers.csv" from a spam feature would end up at "export/spam/spammers.csv" if the backup tarball were unzipped. Backups are distinct from happstack-state's persistent storage: the data in them is only representative of the server state at the time it was created, and instead of serializing data in binary, it is stored in simple text formats, like CSV, which humans can view and edit. This ensures data can be recovered easily, a harder task with binary data.
Backup functions are used in all server modes. backup
mode creates a backup tarball from the enabled features and then shuts down. restore
takes a backup tarball and reads it entry by entry, writing the parsed data structures to persistent storage. new
mode behaves like restore
mode, except it finalizes the restore process immediately without loading any tar entries. convert
mode creates an export tarball in an ad-hoc manner from legacy files. run
mode can also do backup while the server is running.
The type synonym Hackage uses for backup is BackupEntry
, which is short for ([FilePath], ByteString)
. The first part of the pair is the path split on slashes. So the core feature exporting a file at "package/reify-0.1.1/reify.cabal" should provide ["package", "reify-0.1.1", "reify.cabal"], along with the bytestring for the cabal file, to store it at "export/core/package/reify-0.1.1/reify.cabal" in the backup tarball. Feature names act as a primitive namespace for backup files, so a feature has full control over naming its own backup files, so long as they are valid filenames and are less than 155 characters (a limitation of the tar format). On import, this same file would be received as ["package", "reify-0.1.1", "reify.cabal"] along with the ByteString
contents.
It's recommended that features uses as much files as they need, but try to avoid clutter if possible. If the feature is a single map from PackageId
to Maybe String
, one CSV file would suffice, with lines like:
base-4.2.0.0,nothing
containers-0.3.0.0,just,specialstring
The conventions for export are pretty simple. If a feature doesn't need to export anything, it can put Nothing
in its dumpBackup field (type Maybe (BlobStorage -> IO [BackupEntry])
). Features with data redundant to other features' data should not store backup entries, even if they have persistent state, so long as this data can be reconstructed on restore (in the last stage of import--see below). This makes it easier to adjust the backup tarball, if necessary.
Otherwise, there are two approaches: if the blob storage doesn't need to be used, then query state all at once, create a list of BackupEntry
, and return them. For example:
dumpBackup = Just $ \_ -> do
posts <- fmap (Map.toList . blogPosts) $ query BlogState
let authorBackup :: [[String]]
authorBackup = map (\(pid, post) -> [show pid, show $ postAuthor post]) posts
postBackup :: [BackupEntry]
postBackup = map (\(pid, post) -> (["post", show pid], BS.pack $ postText post)) posts
return $ [csvToBackup ["authors.csv"] authorBackup] ++ postBackup
csvToBackup :: [String] -> CSV -> BackupEntry
is a utility function provided by Distribution.Server.Backup.Export
to make a backup entry from a CSV type, where type CSV = [[String]]
, from the csv package. It automatically escapes the fields to be a proper unambiguous CSV file.
If storage needs to be used, try creating a list of ExportEntry
instead of BackupEntry
. It's a very similar type, type ExportEntry = ([FilePath], Either ByteString BlobId)
. What's special about ExportEntry
is that it postpones the operation of loading the blob into memory until the ByteString
itself is needed. It does this using unsafeInterleaveIO magic in readExportBlobs
. You might find these functions useful, from Distribution.Server.Backup.Export
:
csvToExport :: [String] -> CSV -> ExportEntry
blobToExport :: [String] -> BlobId -> ExportEntry
readExportBlobs :: BlobStorage -> [ExportEntry] -> IO [BackupEntry]
An example of its usage:
dumpBackup = Just $ \store -> do
doc <- query GetDocumentation
let exportFunc (pkgid, (blob, _)) = blobToExport [display pkgid, "documentation.tar"] blob
readExportBlobs storage . map exportFunc . Map.toList $ documentation doc
Note that exportFunc
throws away some data. This is fine, because it can be reconstructed on import from the blob itself. Whatever your export requirements, try to keep as much of the backup-file-creating code as pure as possible, so happstack-state isn't required to format complicated entries.
Import is a bit more complicated. there's no guaranteed order for import entries, so hackage-server takes a 3-stage approach.
- Import entry-by-entry, routing each
BackupEntry
to the proper feature - Organize any partial results from the first part.
- Store the results. a. Write any data to happstack-state. b. Read happstack-state data for features this feature depends on, if necessary.
Stages 1 and 2 are allowed to fail. State 1 can complain about improperly formatted entries, and State 2 can fail if the result is inconsistent or there are missing bits. It should not fail if no entries were imported. Stage 3, which modifies the persistent state, should not fail and potentially leave the server data in an inconsistent state.
This is accomplished with a RestoreBackup
type which emulates OOP, in that all of the feature-specific data is hidden, but all RestoreBackup
objects must implement certain functions.
data RestoreBackup = RestoreBackup {
restoreEntry :: BackupEntry -> IO (Either String RestoreBackup),
restoreFinalize :: IO (Either String RestoreBackup),
restoreComplete :: IO ()
}
For examples of using this object, see the PackagesBackup
and UserBackup
modules. If the feature stores a data component FooBar
, then backup would look something like:
fooBarBackup :: RestoreBackup
fooBarBackup = doRestoreFooBar emptyFooBar
doRestoreFooBar :: FooBar -> RestoreBackup
doRestoreFooBar foo = fix $ \r -> RestoreBackup
{ restoreEntry = \entry -> do
res <- importFooBar foo entry
case res of
Left str -> return $ Left str
Right foo' -> return $ Right (doRestoreFooBar foo')
, restoreFinalize = return (Right r) -- no special finalization
, restoreComplete = foo
}
importFooBackup :: FooBar -> BackupEntry -> IO (Either String FooBar)
importFooBackup foo (["foo.csv"], bs) = ...
...
importFooBackup foo _ = return $ Right foo --ignore unknown entries
RestoreBackup
is an instance of Monoid
, so different RestoreBackup
s for different data structures can be combined into one.
The Import
monad is helpful for incrementally modifying the state of a feature and possibly indicating failure. fail str
produces a Left str
. It's also an instance of MonadIO
and MonadState
. Use these functions with it:
-- Run the Import monad.
runImport :: s -> Import s a -> IO (Either String s)
-- Try to read a string (second argument). In the case of failure, the label
-- (first argument) is used to indicate what sort of result was expected
-- (e.g. "user id", "package name").
parseRead :: Read a => String -> String -> Import s a
-- Try to parse a string using its Distribution.Text instance. The first argument is the label.
parseText :: Text a => String -> String -> Import s a
-- Read a time in the standard export format (in Distribution.Server.Backup.Utils)
parseTime :: String -> Import s UTCTime
-- A combinator to read a CSV file. The first argument is the file name; the
-- second is the CSV file itself, and the third is a function to be called if
-- the file was parsed correctly.
importCSV :: String -> ByteString -> (CSV -> Import s a) -> Import s a
As for re-importing blobs, you can use the add :: BlobStorage -> ByteString -> IO BlobId
in Distribution.Server.Util.BlobStorage
.
This structure of the import process makes it possible, in theory, to import features individually rather than in bulk. The main issue is making sure states are consistent: a feature can't have a user id which doesn't exist in the user database.
One last twist on the import scheme is that it can be used to store data redundant with other features' data, so long as that data is retrieved in restoreComplete and can't fail. The order of completion is the same order the features are initialized in, so if feature A depends on feature B's data, feature B's import will complete before feature A's. This is used for reverse dependencies:
restoreBackup = Just $ \_ -> fix $ \r -> RestoreBackup
{ restoreEntry = \_ -> return $ Right r
, restoreFinalize = return $ Right r
, restoreComplete = do
putStrLn "Calculating reverse dependencies"
index <- fmap packageList $ query GetPackagesState
let revs = constructReverseIndex index
update $ ReplaceReverseIndex revs
}