Skip to content

howerj/cdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

% cdb(1) | Constant Database

NAME

CDB - An interface to the Constant Database Library

SYNOPSES

cdb -h

cdb -[cdkstVG] file.cdb

cdb -q file.cdb key [record#]

cdb -g -M minimum -M maximum -R records -S seed

cdb -H

DESCRIPTION

Author:     Richard James Howe
License:    Unlicense
Repository: <https://github.com/howerj/cdb>
Email:      [email protected]

A clone of the CDB database, a simple, read-only (once created) database. The database library is designed so it can be embedded into a microcontroller if needed. This program can be used for creating and querying CDB databases, which consist of key-value pairs of binary data.

This program also includes several options that help in testing out the database, one for hashing input keys and printing the hash for the default hash function and another one for generating a database with (Pseudo-)random keys and values of a given length.

This library can create 16, 32 and 64 bit versions of the CDB file format removing one of the major limitations of the 32-bit version.

OPTIONS

-h : print out this help message and exit successfully

-b : set the size of the CDB database to use (default is 32, can be 16 or 64)

-v: increase verbosity level

-t file.cdb : run internal tests, exit with zero on a pass

-c file.cdb : run in create mode

-d file.cdb : dump the database

-k file.cdb : dump the keys in the database

-s file.cdb : print statistics about the database

-T temp.cdb : name of temporary file to use

-V file.cdb : validate database

-q file.cdb key record-number : query the database for a key, with an optional record

-o number : specify offset into file where database begins

-H : hash keys and output their hash

-g : spit out an example database to standard out

-m number : set minimum length of generated record

-M number : set maximum length of generated record

-R number : set number of generated records

-S number : set seed for record generation

EXAMPLES

Creating a database, called 'example.cdb':

$ ./cdb -c example.cdb
+0,1:->X
+1,0:Y->
+1,1:a->b
+1,1:a->b
+1,2:a->ba
+5,5:hello->world

Note that zero length keys and values are valid, and that duplicate keys are allowed, even keys with the same value. A key with the specified value is created for each duplicate, just like a non-duplicate key.

Looking up values in the created database:

./cdb -q example.cdb ""
./cdb -q example.cdb Y
./cdb -q example.cdb a
./cdb -q example.cdb a 0
./cdb -q example.cdb a 1
./cdb -q example.cdb a 2
./cdb -q example.cdb hello

Dumping a database:

$ ./cdb -d example.cdb

A database dump can be read straight back in to create another database:

$ ./cdb -d example.cdb | ./cdb -c should_have_just_used_copy.cdb

Which is not useful in itself, but assuming your data (both keys and values) is ASCII text with no new lines and NUL characters then you could filter out, modify or add in values with the standard Unix command line tools.

RETURN VALUE

cdb returns zero on success/key found, and a non zero value on failure. Two is returned if a key is not found, any other value indicates a more serious failure.

LIMITATIONS

Three different versions of the library can be built; a 16, a 32 and a 64 bit version. The 32 bit version is the default version. For all versions there is a limit on the maximum file size in the format used of 2^N, where N is the size. Keys and Values have the same limit (although they can never reach that size as some of the overhead is taken up as part of the file format). Any other arbitrary limitation is a bug in the implementation.

The minimum size of a CDB file is 256 * 2 * (N/8) bytes.

It should be noted that if you build a N bit (where N is 16, 32 or 64) version of this library you are limited to creating databases that are the size of N and less, e.g. If cdb_word_t is set to uint32_t, and therefore the 32-bit version of this library is being built, then you can create 32-bit and 16-bit versions of the CDB database format, but you cannot make 64-bit versions. You can set cdb_word_t to uint64_t (which enables the library to create all three mutually incompatible versions of the library) on a 32-bit system, naturally.

INPUT/DUMP FORMAT

The input and dump format follow the same pattern, some ASCII text specifying the beginning of a record and then some binary data with some separators, and a newline terminating the record, the format is:

+key-length,value-length:KEY->VALUE
+key-length,value-length:KEY->VALUE
...
+key-length,value-length:KEY->VALUE

Despite the presence of textual data, the input key and value can contain binary data, including the ASCII NUL character.

An example, encoding the key value pair "abc" to "def" and "G" to "hello":

+3,3:abc->def
+1,5:G->hello

The following awk script can be used to pre-process a series of key-value pairs in the format "key value", with one record per line and optional comment lines:

#!/bin/sh
LC_ALL='C' awk '
  /^[^#]/ {
    print "+" length($1) "," length($2) ":" $1 "->" $2
  }
  END {
    print ""
  }
' | cdb -c "$@"

Which was available in the original original cdb program as 'cdbmake-12'.

FILE FORMAT

The file format is incredibly simple, it is designed so that only the header and the hash table pointer need to be stored in memory during generation of the table - the keys and values can be streamed on to the disk. The header consists of 256 2-word values forming an initial hash table that point to the hash tables at the end of the file, the key-value records, and then up to 256 hash tables pointing to the key-value pairs.

A word consists of a 4-byte/32-bit value (although this may be changed via compile time options, creating an incompatible format). All word values are stored in little-endian format.

The initial hash table contains an array of 256 2-word values. The words are; a position of a hash table in the file and the number of buckets in that hash table, stored in that order. To lookup a key the key is first hashed, the lowest eight bits of the hash are used to index into the initial table and if there are values in this hash the search then proceeds to the second hash table at the end of the file.

The hash tables at the end of the file contains an array of two word records, containing the full hash and a file position of the key-value pair. To search for a key in this table the hash of the key is taken and the lowest eight bits are discarded by shifting right eight places, the hash is then taken modulo the number of elements in the hash table, the resulting value is used as an initial index into the hash table. Searching continues until the key is found, or an empty record is found, or the number of records in the table have been searched through with no match. A key is compared by looking at the hash table records, if the hash of the key matches the stored hash in the hash table records then a possible match is found, the file position is then used to look up the key-value pair and the key is compared.

The number of buckets in the hash table is chosen as twice the number of populated entries in the hash table.

A key-value pair is stored as two words containing the key length and the value length in that order, then the key, and finally the value.

The hashing algorithm used is similar to djb2, but with a minor modification that an exclusive-or replaces an addition. The algorithm calculates hashes of the size of a word, the initial hash value is the special number '5381'. The hash is calculated as the current hash value multiplied by 33, to which the new byte to be hashes and the result of multiplication under go an exclusive-or operation. This repeats until all bytes to be hashed are processed. All arithmetic operations are unsigned and performed modulo 2 raised to the power of 32.

The pseudo code for this is:

set HASH to 5381
for each OCTET in INPUT:
	set HASH to: ((HASH * 33) % pow(2, 32)) xor OCTET
return HASH

Note that there is nothing in the file format that disallows duplicate keys in the database, in fact the API allows duplicate keys to be retrieved. Both key and data values can also be zero bytes long. There are also no special alignment requirements on the data.

The best documentation on the file format is a small pure python script that implements a set of functions for manipulating a CDB database, a description is available here http://www.unixuser.org/~euske/doc/cdbinternals/ and the script itself is available at the bottom of that page http://www.unixuser.org/~euske/doc/cdbinternals/pycdb.py.

A visualization of the overall file structure:

         Constant Database Sections
.-------------------------------------------.
|   256 Bucket Initial Hash Table (2KiB)    |
.-------------------------------------------.
|            Key Value Pairs                |
.-------------------------------------------.
|       0-256 Secondary Hash Tables         |
.-------------------------------------------.

The initial hash table at the start of the file:

    256 Bucket Initial Hash Table (2KiB)
.-------------------------------------------.
| { P, L } | { P, L } | { P, L } |   ...    |
.----------+----------+----------+----------.
|   ...    | { P, L } | { P, L } | { P, L } |
.-------------------------------------------.
P = Position of secondary hash table
L = Number of buckets in secondary hash table

The key-value pairs:

.-------------------------------------------.
| { KL, VL } | KEY ...      | VALUE ...     |
.-------------------------------------------.
KL    = Key Length
VL    = Value Length
KEY   = Varible length binary data key
VALUE = Variable length binary value

Of the variable number of hash tables (which each are of a variable length) at the end of the file:

 0-256 Variable Length Secondary Hash Tables
.---------------------.
| { H, P } | { H, P } |
.----------+----------+---------------------.
| { H, P } |   ...    |   ...    | { H, P } |
.----------+----------+----------+----------.
| { H, P } |   ...    | { H, P } |
.--------------------------------.
H = Hash
P = Position of Key-Value Pair

And that is all for the file format description.

While the keys-value pairs can be streamed to disk and the second level hash table written after those keys, anything that creates a database will have to seek to the beginning of the file to rewrite the header, this could have been avoided by storing the 256 initial hash table results at the end of the file allowing a database to be constructed in a Unix filter, but alas, this is not possible. Also of note, by passing in a custom hash algorithm to the C API you have much more control over where each of the key-value pairs get stored, specifically, which bucket they will end up in by controlling the lowest 8-bits (for example you could set the lowest 8-bits to the first byte in the key in a custom hash).

Note that there is nothing stopping you storing the key-value pairs in some kind of order, you could do this by adding the keys in lexicographic order for a database sorted by key. Retrieving keys using the C function "cdb_foreach" would allow you retrieve keys in order. The hash table itself would remain unaware of this order. Dumping the key-value pairs would maintain this order as well. There is no guarantee other tools will preserve this order however (they may dump key-value pairs backwards, or by going through the hash table).

CDB C API OVERVIEW

There are a few goals that the API has:

  • Simplicity, there should be few functions and data structures.
  • The API is easy to use.
  • There should be minimal dependencies on the C standard library. The library itself should be small and not be a huge, non-portable, "optimized", mess.
  • The user should decide when, where and how allocations are performed. The working set that is allocated should be small.
  • The database driver should catch corrupt files if possible.

Some of these goals are in conflict, being able to control allocations and having minimal dependencies allow the library to be used in an embedded system, however it means that in order to do very basic things the user has to provide a series of callbacks. The callbacks are simple to implement on a hosted system, examples are provided in main.c and host.c in the project repository, but this means the library is not just read to use.

There are two sets of operations that most users will want to perform; creating a database and reading keys. After the callbacks have been provided, to create a database requires opening up a new database in create mode:

/* error handling omitted for brevity */
cdb_t *cdb = NULL;
cdb_options_t ops = { /* Your file callbacks/options go here */ };
cdb_open(&cdb, &ops, 1, "example.cdb");
cdb_buffer_t key   = { .length = 5, .buffer = "hello", };
cdb_buffer_t value = { .length = 5, .buffer = "world", };
cdb_add(cdb, &key, &value);
cdb_close(cdb);

If you are dealing with mostly NUL terminated ASCII/UTF-8 strings it is worth creating a function to deal with them:

int cdb_add_string(cdb_t *cdb, const char *key, const char *value) {
	assert(cdb);
	assert(key);
	assert(value);
	const cdb_buffer_t k = { .length = strlen(key),   .buffer = (char*)key,   };
	const cdb_buffer_t v = { .length = strlen(value), .buffer = (char*)value, };
	return cdb_add(cdb, &k, &v);
}

Note that you cannot query for a key from a database opened up in create mode and you cannot add a key-value pair to a database opened up in read mode. The operations are mutually exclusive.

To search for a key within the database, you open up a database connection in read mode (create = 0):

/* error handling omitted for brevity */
cdb_t *cdb = NULL;
cdb_options_t ops = { /* Your file callbacks/options go here */ };
cdb_open(&cdb, &ops, 1, "example.cdb");
cdb_buffer_t key = { .length = 5, .buffer = "hello" };
cdb_file_pos_t value = { 0, 0, };
cdb_get(cdb, &key, &value);
/* use cdb_seek, then cdb_read, to use returned value */
cdb_close(cdb);

Upon retrieval of a key the database does not allocate a value for you, instead it provides an object consisting of a file position and a length of the value. This can be read from wherever the database is stored with the function 'cdb_read'. Before issuing a read, 'cdb_seek' must be called as the file handle may be pointing to a different area in the database.

If a read or a seek is issued that goes outside of the bounds of the database then all subsequent database operations on that handle will fail, not just reads or seeks. The only valid things to do on a database that has returned a negative number is to call 'cdb_status' and then 'cdb_close' and never use the handle again. 'cdb_status' must not be used on a closed handle.

As there are potentially duplicate keys, the function 'cdb_count' can be used to query for duplicates. It sets the parameter count to the number of records found for that key (and it sets count to zero, and returns zero, if no keys are found, it returns one if one or more keys were found).

The function 'cdb_status' can be used to query what error has occurred, if any. On an error a negative value is returned, the meaning of this value is deliberately not included in the header as the errors recorded and the meaning of their values may change. Use the source for the library to determine what error occurred.

The function 'cdb_version' returns the version number in an out parameter and information about the compile time options selected when the library was built. A Semantic Version Number is used, which takes the form "MAJOR.MINOR.PATCH". The PATCH number is stored in the Least Significant Byte, the MINOR number the next byte up, and the MAJOR in the third byte. The fourth byte contains the compile time options.

There are several things that could be done to speed up the database but this would complicate the implementation and the API.

C API FUNCTIONS

The C API contains 13 functions and some callbacks, more than is desired, but they all have their uses. Ideally a library would contain far fewer functions and require less of a cognitive burden on the user to get right, however making a generic enough C library and using C in general requires more complexity than is usual, but not more than is necessary.

There is regularity in these functions, they all return negative on failure (the only exception being the allocator callback that returns a pointer), most of the functions accept a "cdb_t" structure as well, which is an opaque pointer (opaque pointers are not an unalloyed good, they imply that an allocator must be used, which can be a problem in embedded systems).

int cdb_open(cdb_t **cdb, const cdb_options_t *ops, int create, const char *file);
int cdb_close(cdb_t *cdb);
int cdb_read(cdb_t *cdb, void *buf, cdb_word_t length);
int cdb_add(cdb_t *cdb, const cdb_buffer_t *key, const cdb_buffer_t *value);
int cdb_seek(cdb_t *cdb, cdb_word_t position);
int cdb_foreach(cdb_t *cdb, cdb_callback cb, void *param);
int cdb_read_word_pair(cdb_t *cdb, cdb_word_t *w1, cdb_word_t *w2);
int cdb_get(cdb_t *cdb, const cdb_buffer_t *key, cdb_file_pos_t *value);
int cdb_lookup(cdb_t *cdb, const cdb_buffer_t *key, cdb_file_pos_t *value, long record);
int cdb_count(cdb_t *cdb, const cdb_buffer_t *key, long *count);
int cdb_status(cdb_t *cdb);
int cdb_version(unsigned long *version);
int cdb_tests(const cdb_options_t *ops, const char *test_file);

typedef int (*cdb_callback)(cdb_t *cdb, const cdb_file_pos_t *key, const cdb_file_pos_t *value, void *param);
  • cdb_open

The most complex function that contains the most parameters, "cdb_open" is used to open a connection to a database. A pointer to a handle is passed to the first parameter, using the supplied allocation callback (passed-in in the "ops" parameter) the function will allocate enough space for "cdb_t" structure, this out-parameter is the database handle. It will be set to NULL on failure, which will also be indicated with a negative return value on the "cdb_open" function. Once "cdb_close" is called on this handle the handle should not be used again, and "cdb_close" should only be called on the returned handle once.

A single database can be opened by as many readers as you like, however reading a database and writing to a database are mutually exclusive operations.

When writing to a database there should not be any readers active on that database. This is a fundamental limitation of the database design.

Writing to a CDB file that is being read by another CDB instance can cause corruption of data and general nasty things! Do not do it!

As such, a database can only be opened up in read only, or write only mode.

The "file" parameter is passed to the "open" callback, which is present in the "ops" parameter.

	void *(*open)(const char *name, int mode);

The callback should return an opaque pointer on success and NULL on failure. It is used to open up a handle to the database via whatever method the library user would like (for example, a simple file present in your file system, or a section of flash in an embedded computer). The open callback is used by "cdb_open" and should not be called directly.

The "mode" parameter to the "open" callback will be set to "CDB_RW_MODE" if "create" is non-zero, and will be set to "CDB_RO_MODE" if it is zero.

CDB_RW_MODE is an enumeration that has the value "1", whilst CDB_RW_MODE has the value "0".

"cdb_open" does quite a lot, when opening a CDB file for reading the file is partially verified, when opening for writing a blank first level hash table is written to disk. If either of this fails, then opening the database will fail.

The function also needs the callbacks to perform a seek to be present, along with the callback for reading. The write callback only needs to present when the database is opened up in write mode.

  • cdb_close

This closes the CDB database handle, the handle may be NULL, if so, nothing will be done. The same handle should not be passed in twice to "cdb_close" as this can cause double-free errors. This function will release any memory and handles (by calling the "close" callback) associated with the handle.

When writing a database this function has one more task to do, and that is finalizing the database, it writes out the hash-table at the end of the file. If "cbd_close" is not called after the last entry has been added then the database will be in an invalid state and will not work.

This function may return negative on error, for example if the finalization fails.

After calling "cdb_close" the handle must not be used again.

  • cdb_read

To be used on a database opened up in read-mode only. This can be used to read values, and sometimes keys, from the database. This function does not call "cdb_seek", the caller must call "cdb_seek" before calling this function to move the file pointer to the desired location before reading. The file pointer will be updated to point to after the location that has been read (or more accurately, the read callback must do this). This function does not return the number of bytes read, instead it returns zero for no error and negative if an error condition occurs (a partial read is treated as an error).

  • cdb_add

To be used on a database opened up in write, or creation, mode only.

This function adds a key-value pair to the database, which can be looked up only after finalizing the database (by calling "cdb_close") and reopening the database in read-only mode, which should be done after the final "cdb_add" has been added.

It is unfortunate that both the key and value must reside within memory, but doing anything else would complicate the API too much.

One the key and value have been added they can be freed or discarded however.

Adding key-value pairs consumes disk space and some extra memory which is needed to store the second level hash table, however the keys and values are not kept around in memory by the CDB library.

Note that this function will add duplicate keys without complaining, and can add zero length keys and values, likewise without complaining.

It is entirely up to the caller to prevent duplicates from being added. This is one improvement that could be added to the library (as you cannot check or query a partially written database at the moment).

  • cdb_seek

This function changes the position that the next read or write will occur from. You should not seek before or after the database, doing so will result in an error. Seeking is always relative to the start of the file, the optional offset specified in the CDB options structure being added to the current position. Relative to current position or file-end seeks cannot be done.

This function must be called before each call to "cdb_read" or "cdb_read_word_pair", otherwise you may read garbage.

Calling "cdb_seek" multiple times on the same location has no effect (the "fseek" C standard library function may discard buffers if called multiple times on the same location even though the file position has not changed).

  • cdb_foreach

The "cdb_foreach" function calls a callback for each value within the CDB database. The callback is passed an optional "param". If the callback returns negative or a non-zero number then the for-each loop is terminated early (a positive number is returned, a negative number results in -1 being returned). If the callback returns zero then the next value, if any, is processed with the callback being called again.

The callback is passed a structure which contains the location within the CDB database that contains the key and value. The keys and values are not presented in any specific order and the order should not be expected to stay the same between calls.

To read either a key or a value you must call "cdb_seek" before calling "cdb_read" yourself.

Passing in NULL is allowed and is not a No-Operation, it can be used to effectively check the integrity of the database.

  • cdb_read_word_pair

To be used on a database opened up in read-mode only. This function is a helper function that strictly does not need to exist, it is used for reading two "cdb_word_t" values from the database. This can be useful for the library user for more detailed analysis of the database than would normally be possible, many values within the database are stored as two "cdb_word_t" values. Looking inside this read-only database is not discouraged and the file format is well documented.

This function does not call "cdb_seek", that must be called before hand to seek to the desired file location. The file position will be updated to point after the two read values.

  • cdb_get

This function populates the "value" structure if the "key" is found within the CDB database. The members of "value" will be set to zero if a key is not found, if it is found the position will be non-zero, although the length may be zero.

Note that this function does not actually retrieve the key and put it into a buffer, there is a very good reason for that. It would be easy enough to make such a function given the functions present in this API, however in order to make such a function it would have to do the following; allocate enough space to store the value, read the value off of disk and then return the result. This has massive performance implications. Imagine if a large value is stored in the database, say a 1GiB value, this would mean at least 1GiB of memory would need to be allocated, it would also mean all of the file buffers would have been flushed and refilled, and all of that data would need to be copied from disk to memory. This might be desired, it might also be very wasteful, especially if only a fraction of the value is actually needed (say the first few hundred bytes). Whether this is wasteful depends entirely on your workload and use-cases for the database.

It is better to give the user tools to do what they need than insisting it be done one, limiting, although "easy", way.

This does mean that to actually retrieve the value the user must perform their own "cdb_seek" and "cdb_read" operations. This means that the entire value does not need to read into memory be the consumer, and potentially be processed block by block by the "read" callback if needed.

  • cdb_lookup

"cdb_lookup" is similar to "cdb_get" except it accepts an optional record number. Everything that applies to the get-function applies to the lookup-function, the only difference is the record number argument (internally "cdb_get" is implemented with "cdb_lookup").

If there are two or more keys that are identical then the question of how to select a specific key arises. This is done with an arbitrary number that will most likely, but is not guaranteed, to be the order in which the key was added into the database, with the first value being zero and the index being incremented from there on out.

If the key is found but the index is out of bounds it is treated as if the key does not exist. Use "cdb_count" to calculate the maximum number records per key if needed, it is far more expensive to repeatedly call "cdb_lookup" on a key until it returns "key not found" to determine the number of duplicate keys than it is to call "cdb_count".

The index argument perhaps should be a "cdb_word_t", but there is always debate around these topics (personally if I were to design a C-like programming language everything integers would default to 64-bits and all pointers would fit within that, other types for indexing and the like would also be 64-bit, that's not a criticism of C, the madness around integer types was born out of necessity).

  • cdb_count

The "cdb_count" function counts the number of entries that have the same key value. This function requires potentially multiple seeks and reads to compute, so the returned value should be cached if you plan on using it again as the value is expensive to calculate.

If the key is not found, a value indicating that will be returned and the count argument will be zeroed. If found, the count will be put in the count argument.

  • cdb_status

This function returns the status of the CDB library handle. All errors are sticky in this library, if an error occurs when handling a CDB database then there is no way to clear that error short of reopening the database with a new handle. The only valid operation to do after getting an error from any of the functions that operate on a "cdb_t" handle is to call "cdb_status" to query the error value that is stored internally.

"cdb_status" should return a zero on no error and a negative value on failure. It should not return a positive non-zero value.

  • cdb_version

"cdb_version" returns the version number of the library. It stores the value in an unsigned long. This may return an error value and a zero value if the version has not been set correctly at compile time.

The value is stored in "MAJOR.MINOR.PATH" format, with "PATH" stored in the Least Significant Byte. This is a semantic version number. If the "MAJOR" number has changed then there are potentially breaking changes in the API or ABI of this library that have been introduced, no matter how trivial.

  • cdb_tests

And the callback for "cdb_foreach":

  • "cdb_callback"

This callback is called for each value within the CDB database when used with "cdb_foreach". If a negative value is returned from this callback then the foreach loop will end early and an error value will be returned. If the value returned is greater than zero then the foreach loop will terminate potentially early. If zero the foreach loop will continue to the next key-value pair if available.

Each time this callback is called by "cdb_foreach" it will be passed in a key-value pair in the form of two length/file-location structures. You will need to seek to those locations and call read the key-values yourself. There is no guarantee the file position is in the correct location (ie. Pointing to the location of the key), so call "cdb_seek" before calling "cdb_read".

There is no guarantee that the key-value pairs will be presented in the same order each time the function is called and should not be counted on. There is no attempt to preserve order.

See "cdb_foreach" for more information.

C API STRUCTURES

The C API has two simple structures and one complex one, the latter being more of a container for callbacks (or, some might say, a way of doing object oriented programming in C). The complex structure, "cdb_options_t", is an unfortunate necessity.

The other two structures, "cdb_buffer_t" and "cdb_file_pos_t", are simple enough and need very little explanation, although they will be.

Let us look at the "cdb_options_t" structure:

typedef struct {
	void *(*allocator)(void *arena, void *ptr, size_t oldsz, size_t newsz);
	cdb_word_t (*hash)(const uint8_t *data, size_t length);
	int (*compare)(const void *a, const void *b, size_t length);
	cdb_word_t (*read)(void *file, void *buf, size_t length);
	cdb_word_t (*write)(void *file, void *buf, size_t length);
	int (*seek)(void *file, uint64_t offset);
	void *(*open)(const char *name, int mode);
	int (*close)(void *file);
	int (*flush)(void *file);

	void *arena;
	cdb_word_t offset;
	unsigned size;
} cdb_options_t;

Each member of the structure will need an explanation.

STRUCTURE CALLBACKS

  • allocator

This function is based off of the allocator callback mechanism present in Lua, see https://www.lua.org/manual/5.1/manual.html#lua_setallocf for more information on that allocator. This function can handle freeing memory, allocating memory, and reallocating memory, all in one function. This allows the user of this library to specify where objects are allocated and how.

The arguments to the callback mean:

  1. arena

This may be NULL, it is an optional argument that can be used to store memory allocation statistics or as part of an arena allocator.

  1. ptr

This should be NULL if allocating new memory, of be a pointer to some previously allocated memory if freeing memory or reallocating it.

  1. oldsz

The old size of the pointer if known, if unknown, use zero. This is used to prevent unnecessary allocations.

  1. newz

The new size of the desired pointer, this should be non-zero if reallocating or allocating memory. To free memory set this to zero, along with providing a pointer to free. If this is zero and the "ptr" is NULL then nothing will happen.

  1. The return value

This will be NULL on failure if allocating memory or reallocating memory and that operation failed. It will be non-NULL on success, containing usable memory. If freeing memory this should return NULL.

An example allocator using the built in allocation routines is:

void *allocator_cb(void *arena, void *ptr, size_t oldsz, size_t newsz) {
	UNUSED(arena);
	if (newsz == 0) {
		free(ptr);
		return NULL;
	}
	if (newsz > oldsz)
		return realloc(ptr, newsz);
	return ptr;
}

This callback is both simple and flexible, and more importantly puts the control of allocating back to the user (I know I have repeated this many times throughout this document, but it is worth repeating!).

compare: /* key comparison function: NULL defaults to memcmp */
write: https://roboquill.io/
flush: /* (optional) called at end of successful creation */

arena:   /* used for 'arena' argument for the allocator, can be NULL if allocator allows it */
offset: /* starting offset for CDB file if not at beginning of file */
size:  /* Either 0 (same as 32), 16, 32 or 64, but cannot be bigger than 'sizeof(cdb_word_t)*8' */
  • hash (optional)

The "hash" callback can be set to NULL, if that is the case then the default hash, based off of djb2 and present in the original CDB library, will be used. If you do provide your own hash function you will effectively make this database incompatible with the standard CDB format but there are valid reasons for you do do this, you might need a stronger hash that is more resistant to denial of service attacks, or perhaps you want similar keys to collide more to group them together.

The hash function returns "cdb_word_t" so the number of bits this function returns is dependent on big that type is (determined at compile time).

  • compare (optional)

This function compares keys for a match, the function should behave like memcmp, returning the same values on a match and a failure. You may want to change this function if you want to compare keys partially, however you will also need to change the hash function to ensure keys are sorted into the right 256 buckets for your comparison (for example, with the default hash function two keys with the same prefix could be stored in two separate buckets).

FILE CALLBACKS

The following callbacks act in a similar way to the file functions present in stdio.h. The only function missing is an ftell equivalent.

  • read

This function is used to read data out of the database, wherever that data is stored. Unlike fread a status code is returned instead of the length of the data read, negative indicating failure. A partial read should result in a failure. The only thing lacking from this callback is a way to signal to perform non-blocking Input and Output, that would complicate the internals however. The "read" callback should always be present.

The first parameter, "file", is a handle to an object returned by the "open" callback.

The callback should return 0 indicating no error if "length" bytes have been read into "buf".

Reading should continue from the previous file pointer position, that is if you open a file handle, read X bytes, the next time you read Y bytes they should be read from the end of the X bytes and not the beginning of the file (hence why read does not take a file position).

If implementing read callbacks in an embedded system you might have to also implement that behavior.

  • write (conditionally optional, needed for database creation only)

Similar to the "read" callback, but instead writes data into wherever the database is stored.

  • seek

This callback sets the file position that subsequent reads and writes occur from.

  • open

This callback should open the resource specified by the "name" string (which will usually be a file name). There are two modes a read/write mode (used to create the database) and a read-only mode. This callback much like the "close" callback will only be called once internally by the CDB library.

  • close

This callback should close the file handle returned by "open", freeing any resources associated with that handle.

  • flush (optional)

An optional callback used for flushing writes to mass-storage. If NULL then the function will not be called.

STRUCTURE VARIABLES

  • arena (optional, can be NULL, depends on your allocator)

This value is passed into the allocator as the "arena" argument whenever the allocator is called. It can be NULL, which will usually be the case if you are just using "malloc", "realloc" and "free" to implement the allocator, but if you are implementing your own arena based allocator you might want to set it to point to your arena (hence the name).

  • offset

This offset can be used for CDB databases embedded within a file. If the CDB database does not begin at the start of the file (or flash, or wherever) then you can set this offset to skip over that many number of bytes in the file.

  • size

The size variable, which can be left at zero, is used to select the word size of the database, this has an interaction with "cdb_word_t".

Missing perhaps is a unsigned field that could contain options in each bit position in that field.

BUFFER STRUCTURE

typedef struct {
	cdb_word_t length; /* length of data */
	char *buffer;      /* pointer to arbitrary data */
} cdb_buffer_t; /* used to represent a key or value in memory */

FILE POSITION STRUCTURE

typedef struct {
	cdb_word_t position; /* position in file, for use with cdb_read/cdb_seek */
	cdb_word_t length;   /* length of data on disk, for use with cdb_read */
} cdb_file_pos_t; /* used to represent a value on disk that can be accessed via 'cdb_options_t' */

EMBEDDED SUITABILITY

There are many libraries written in C, for better or worse, as it is the lingua franca for software development at the moment. Few of those libraries are directly suitable for use in Embedded systems and are much less flexible than they could be in general. Embedded systems pose some interesting constraints (eschewing allocation via "malloc", lack of a file-system, and more). By designing the library for an embedded system we can make a library more useful not only for those systems but for hosted systems as well (eg. By providing callbacks for the FILE functions we can redirect them to wherever we like, the CDB file could be stored remotely and accessed via TCP, or it could be stored locally using a normal file, or it could be stored in memory).

There are two sets of functions that should be abstracted out in nearly every library, memory allocation (or even better, the caller can pass in fixed length structures if possible) and Input/Output functions (including logging!). This library does both.

There is one area in which the library is lacking, the I/O functions do not yield if there is nothing to read yet, or a write operation is taking too long. This does impose constraints on the caller and how the library is used (all calls to the library could block for an arbitrary length of time). The callbacks could return a status indicating the caller should yield, but yielding and restoring state to enable partially completed I/O to finish would greatly complicate the library (this would be trivial to implement if C had portable coroutines built into the language).

More libraries should be written with this information in mind.

TEST SUITE

There is a special note that should be mentioned about how the test suite is handled as it is important.

It is difficult to make a good API that is easy to use, consistent, and difficult to misuse. Bad APIs abound in common and critical software (names will not be named) and can make an already difficult to use language like C even more difficult to use.

One mistake that is often seen is API functionality that is conditional upon an macro. This complicates the build system along with every piece of software that is dependent on those optional calls. The most common function to be optionally compiled in are test suite related functions if they are present. For good reason these test suites might need to be removed from builds (as they might take up large amounts of space for code even if they are not needed, which is at a premium in embedded systems with limited flash memory).

The header often contains code like this:

#ifdef LIBRARY_UNIT_TESTS
int library_unit_tests(void);
#endif

And the code like this, in C like pseudo-code:

#ifdef LIBRARY_UNIT_TESTS
int test_function_1(void) {
	/* might call malloc directly, making this unsuitable
	to be included in an embedded system */
	return result;
}

int library_unit_tests(void) {
	/* tests go here */
	if (test_function_1() != OK)
		return FAIL;
	return PASS;
}
#endif

In order to call this code you need to be aware of the "LIBRARY_UNIT_TESTS" macro each time the function "library_unit_tests" is called, and worse, whether or not your library was compiled with that macro enabled resulting in link-time errors. Another common mistake is not passing in the functions for I/O and allocation to the unit test framework, making it unsuitable for embedded use (but that is a common criticism for many C libraries and not just unit tests).

Compare this to this libraries way of handling unit tests:

In the header:

int cdb_tests(const cdb_options_t *ops, const char *test_file);

And the relevant bits of code/pseudo-code:

static uint64_t xorshift128(uint64_t s[2]) {
	assert(s);
	/* XORSHIFT-128 algorithm */
	return NEXT_PRNG;
}


int cdb_tests(const cdb_options_t *ops, const char *test_file) {
	assert(ops);
	assert(test_file);
	BUILD_BUG_ON(sizeof (cdb_word_t) < 2);

	if (CDB_TESTS_ON == 0)
		return CDB_OK_E;

	/* LOTS OF TEST CODE NOT SHOWN, some of which
	uses "xorshift128". */

	return STATUS;
}

There is no "ifdef" surrounding any of the code (using "ifdef" anywhere to conditionally execute code is usually a mistake, is only used within the project to set default macro values if the macro is not previously defined, an acceptable usage).

Two things are important here, the first, all of the Input and Output and memory related functions are passed in via the "ops" structure, as mentioned. This means that the test code is easy to port and run on a microcontroller which might not have a file system (for testing and development purposes you might want to run the tests on a microcontroller but not keep them in in the final product).

The main difference is the lack of "ifdef" guards, instead if the macro "CDB_TESTS_ON" is false the function "cdb_tests" returns "CDB_OK_E" (there is some debate if the return code should be this, or something to indicate the tests are not present, but that is a separate issue, the important bit is the return depending on whether the tests are present).

This "if" statement is a far superior way of handling optional code in general. The caller does not have to worry if the function is present or not, as the function will always be present in the library. Not only that, but if the tests are not run because the compile time macro "CDB_TESTS_ON" is false then the compiler will optimize out those tests even on the lowest optimization settings (on any decent compiler).

This also has the advantage that the code that is not run still goes through the compilation step meaning the code is less likely to be wrong when refactoring code. Not only that, but because "xorshift128" which "cdb_tests" depends on, is declared to be static, if "CDB_TESTS_ON" is false it to will be eliminated from the compiled object file so long as no other function calls it. In actual fact, the code has changed since this has been written and "cdb_prng" is exposed in the header as it is useful in main.c, which is equivalent to "xorshift128".

BUILD REQUIREMENTS

If you are building the program from the repository at https://github.com/howerj/cdb you will need GNU Make and a C Compiler. The library is written in pure C99 and should be fairly simple to port to another platform. Other Make implementations may work, however they have not been tested. git is also used as part of the build system.

First clone the repository and change directory to the newly clone repository:

git clone https://github.com/howerj/cdb cdb
cd cdb

Type 'make' to build the cdb executable and library.

Type 'make test' to build and run the cdb internal tests. The script called 't', written in sh, does more testing, and tests that the user interface is working correctly. 'make dist' is used to create a compressed tar file for distribution. 'make install' can be used to install the binaries, however the default installation directory (which can be set with the 'DESTDIR' makefile variable) installs to a directory called 'install' within the repository - it will not actually install anything. Changing 'DESTDIR' to '/usr' should install everything properly. pandoc is required to build the manual page for installation, which is generated from this markdown file.

Look at the source file cdb.c to see what compile time options can be passed to the compiler to enable and disable features (if code size is a concern then the ability to create databases can be removed, for example).

RENAME

CDB databases are meant to be read-only, in order to add entries to a database that database should be dumped and new values added in along with the old ones. That is, to add in a new value to the database the entire database has to be rebuilt. This is not a problem for some work loads, for some work loads the database could be rebuilt every X hours.

If this does present a problem, then you should not use this database.

However, when a database does have to be rebuilt how do you make sure that users of it point to the new database and not the old one?

If you access the database via the command line applications then the "rename" function, which is atomic on POSIX systems, will do what is needed. This is, a mechanism to swap out the old database with a new one without affecting any of the current readers.

A rename can be done in C like so:

rename("new.cdb", "current.cdb"); /* Atomic rename */

If a reader opens "current.cdb" before the rename then it will continue to read the old database until it closes the handle and opens up "current.cdb" after the rename. The files data persists even if there is no file name that points to it so long as there are active users of that file (ie. If a file handle to that file is still open). This will mean that there could be processes that use old data, but not inconsistent data. If a reader opens up the data after the rename, it will get the new data.

This also means that the writer should never write to a file that is currently in use by other readers or writers, it should write to a new file that will be renamed to the file in use, and it also means that a large amount of disk storage space will be in use until all users of the old databases switch to the new databases allowing the disk space to be reclaimed by the operating system.

POSSIBLE DIRECTIONS

There are many additions that could be made to a project, however the code is quite compact and neat, anything else that is needed could be built on top of this library. Some ideas for improvement include; adding a header along with a CRC, adding (unsafe) functions for rewriting key-values, adding (de)compression (with the shrink library) and decryption, integrating the project in an embedded system in conjunction with littlefs as an example, allowing the user to supply their own comparison and hash functions, adding types and schemas to the database, and more. The project could also be used as the primary database library for the pickle interpreter, or for serving static content in the eweb web-server.

All of these would add complexity, and more code - making it more useful to some and less to others. As such, apart from bugs, the library and test driver programs should be considered complete.

The lack of a header might be solved in creative ways as:

  • The integrity of most of the file can be checked by making sure all pointers are within bounds, that key-value pairs are stored one after another and that each key is in the right bucket for that hash. The only things not checked would be the values (they would still have to be of the right length).
  • If a file successfully passes a verification it can be identified as a valid CDB file of that size, this means we would not need to store header information about the file type and structure. This has been verified experimentally (the empty and randomly generated databases of a different size do not pass verification when the incorrect size is specified with the "-b" option).
  • We could place the header within the key-value section of the database, or even at the end of the file.

Things that should and could be done, but have not:

  • Fuzzing with American Fuzzy Lop to iron out the most egregious bugs, security relevant or otherwise. This has been used on the pickle library to great effect and it finds bugs that would not be caught be unit testing alone. The library is currently undergoing fuzzing, nothing bad found so far.
  • The current library implements a system for looking up data stored to disk, a system could be created that does so much more. Amongst the things that could be done are:
    • Using the CDB file format only as a serialization format for an in memory database which would allow key deletion/replacing. This Key-Value store would essentially just be an in memory hash table with a fancy name, backed by this library. The project could be done as part of this library or as a separate project.
    • Implementing the memcached protocol to allow remote querying of data.
    • Alternatively make a custom protocol that accept commands over UDP. There are a few implementation strategies for doing this.
  • Alternatively, just a simple Key-Value store that uses this database as a back-end without anything else fancy.
  • Changing the library interface so it is a header only C library.
  • Making a set of callbacks to allow an in memory CDB database, useful for embedding the database within binaries.
  • Designing a suite of benchmarks for similar databases and implementations of CDB, much like https://docs.huihoo.com/qdbm/benchmark.pdf.

Porting this to Rust and making a crate for it would be nice, although implementations already exists. Just making bindings for this library would be a good initial step, along with other languages.

For more things that are possible to do:

  • The API supplies a for-each loop mechanism where the user supplies a callback, an iterator based solution would be more flexible (but slightly more error prone to use).
  • The user can specify their own hash algorithm, using one with perhaps better characteristics for their purposes (and breaking compatibility with the original format). One interesting possibility is using a hashing algorithm that maximizes collisions of similar keys, so similar keys are grouped together which may be useful when iterating over the database. Unfortunately the initial 256 wide bucket system interferes with this, which could be remedied by returning zero for lowest eight bits, degrading performance. It is not really viable to do this with this system, but hashing algorithms that maximize collisions, such as SOUNDEX, are interesting and deserve a mention. This could be paired with a user supplied comparison function for comparing the keys themselves.
  • The callbacks for the file access words ("open", "read", ...) deserve their own structure so it can be reused, as the allocator can, although it may require some changes to how those functions work (such as different return values, passing in a handle to arbitrary user supplied data, and more).
  • Options for making the file checking more lax, as information could be stored between the different key/value pairs making the file format semi-compatible between implementations. This could be information usually stored in the header, or information about the key/values themselves (such as type information). Some implementations, including this one, are more strict in what they accept.
  • Some of the functions in main.c could be moved into cdb.c so users do not have to reimplement them.
  • A poor performance Bloom Filter like algorithm can be made using the first level hash table. A function to return whether an item may be in the set or is definitely not can be made by checking whether there are any items in the first 256 bucket that key hashes to. The 256 bucket is small enough to fit in memory, as are the second level hash tables which could be used to improve performance even more.
  • If the user presorts the keys when adding the data then the keys can be retrieved in order using the "foreach" API call. The user could sort on the data instead if they like.
  • The way version information is communicated within the API is not perhaps the best way of doing it. A simple macro would suffice.
  • The file format really could use a redesign. One improvement apart from adding a header would be to move the 256 bucket initial hash table to the end of the file so the entire file format could be streamed to disk.

BUGS

For any bugs, email the author. It comes with a 'works on my machine guarantee'. The code has been written with the intention of being portable, and should work on 32-bit and 64-bit machines. It is tested more frequently on a 64-bit Linux machine, and less frequently on Windows. Please give a detailed bug report (including but not limited to what machine/OS you are running on, compiler, compiler version, a failing example test case, your blood type and star sign, etcetera).

PYTHON IMPLEMENTATION

Available from here https://www.unixuser.org/~euske/doc/cdbinternals/index.html. It probably is the most succinct description and understandable by someone not versed in python.

#!/usr/bin/env python

# Python implementation of cdb

# calc hash value with a given key
def calc_hash(s):
  return reduce(lambda h,c: (((h << 5) + h) ^ ord(c)) & 0xffffffffL, s, 5381)

# cdbget(fp, basepos, key)
def cdbget(fp, pos_header, k):
  from struct import unpack

  r = []
  h = calc_hash(k)

  fp.seek(pos_header + (h % 256)*(4+4))
  (pos_bucket, ncells) = unpack('<LL', fp.read(4+4))
  if ncells == 0: raise KeyError

  start = (h >> 8) % ncells
  for i in range(ncells):
    fp.seek(pos_bucket + ((start+i) % ncells)*(4+4))
    (h1, p1) = unpack('<LL', fp.read(4+4))
    if p1 == 0: raise KeyError
    if h1 == h:
      fp.seek(p1)
      (klen, vlen) = unpack('<LL', fp.read(4+4))
      k1 = fp.read(klen)
      v1 = fp.read(vlen)
      if k1 == k:
	r.append(v1)
	break
  else:
    raise KeyError

  return r


# cdbmake(filename, hash)
def cdbmake(f, a):
  from struct import pack

  # write cdb
  def write_cdb(fp):
    pos_header = fp.tell()

    # skip header
    p = pos_header+(4+4)*256  # sizeof((h,p))*256
    fp.seek(p)

    bucket = [ [] for i in range(256) ]
    # write data & make hash
    for (k,v) in a.iteritems():
      fp.write(pack('<LL',len(k), len(v)))
      fp.write(k)
      fp.write(v)
      h = calc_hash(k)
      bucket[h % 256].append((h,p))
      # sizeof(keylen)+sizeof(datalen)+sizeof(key)+sizeof(data)
      p += 4+4+len(k)+len(v)

    pos_hash = p
    # write hashes
    for b1 in bucket:
      if b1:
	ncells = len(b1)*2
	cell = [ (0,0) for i in range(ncells) ]
	for (h,p) in b1:
	  i = (h >> 8) % ncells
	  while cell[i][1]:  # is call[i] already occupied?
	    i = (i+1) % ncells
	  cell[i] = (h,p)
	for (h,p) in cell:
	  fp.write(pack('<LL', h, p))

    # write header
    fp.seek(pos_header)
    for b1 in bucket:
      fp.write(pack('<LL', pos_hash, len(b1)*2))
      pos_hash += (len(b1)*2)*(4+4)
    return

  # main
  fp=file(f, "wb")
  write_cdb(fp)
  fp.close()
  return


# cdbmake by python-cdb
def cdbmake_true(f, a):
  import cdb
  c = cdb.cdbmake(f, f+".tmp")
  for (k,v) in a.iteritems():
    c.add(k,v)
  c.finish()
  return


# test suite
def test(n):
  import os
  from random import randint
  a = {}
  def randstr():
    return "".join([ chr(randint(32,126)) for i in xrange(randint(1,1000)) ])
  for i in xrange(n):
    a[randstr()] = randstr()
  #a = {"a":"1", "bcd":"234", "def":"567"}
  #a = {"a":"1"}
  cdbmake("my.cdb", a)
  cdbmake_true("true.cdb", a)
  # check the correctness
  os.system("cmp my.cdb true.cdb")

  fp = file("my.cdb")
  # check if all values are correctly obtained
  for (k,v) in a.iteritems():
    (v1,) = cdbget(fp, 0, k)
    assert v1 == v, "diff: "+repr(k)
  # check if nonexistent keys get error
  for i in xrange(n*2):
    k = randstr()
    try:
      v = a[k]
    except KeyError:
      try:
	cdbget(fp, 0, k)
	assert 0, "found: "+k
      except KeyError:
	pass
  fp.close()
  return

if __name__ == "__main__":
  test(1000)

This tests the python version implemented here against another python implementation. It only implements the original 32-bit version.

COPYRIGHT

The libraries, documentation, and the test driver program are licensed under the Unlicense. Do what thou wilt.