diff --git a/adaptors/collections.md b/adaptors/collections.md new file mode 100644 index 000000000000..36c6caaf60bd --- /dev/null +++ b/adaptors/collections.md @@ -0,0 +1,328 @@ +--- +title: Collections Adaptor +--- + +## Collections Overview + +The Collections API provides access to a secure key/value store on the OpenFn +Platform. It is designed for high performance over a large volume of data. + +Collections are secure, private datastores which are visible only to Workflows +within a particular OpenFn Project. They can be created, managed and destroyed +from the OpenFn Admin page. + +When running in the CLI, a Personal Access Token can be used to get access to +the collection (generated from the app at /profile/tokens). + +See the [Collections](documentation/build/collections) Platform Docs to learn +more about Collections. + +:::caution + +Collections must be created in the platform Admin page before they can be used. + +Refer to the [Collections Docs](documentation/build/collections) for details + +::: + +## The Collections Adaptor + +The Collections API is inserted into all Steps through a special kind of +adaptor. + +Uniquely, the Collections adaptor it is designed to be run _alongside_ other +adaptors, not by itself. It is injected into the runtime environment for you for +you by OpenFn. This makes the Collections API available to every Step in a +Workflow, regardless of which adaptor it is using. + +If using the CLI run a workflow with Collections, refer to the +[CLI Usage](#cli-usage) guide below. + +## Usage Guide + +All values in a Collection are stored under a string key. Values are stored as +Strings, but the Collections API will automatically serialized and de-serialize +JSON objects to strings for you (so, in effect, you can treat keys as strings +and value as objects). + +Collections can be manipulated using a single key a pattern - where a pattern is +a string with a wildcard. So the key-pattern `mr-benn` will only match a single +value under the key `mr-benn`, but the pattern `2024*` will match all keys which +start with `2024` but have any other characters afterwards. The pattern +`2024*mr-benn*` will match keys starting with 2024, then having some values plus +the string `mr-benn`, plus any other sequence of characters (in other words, +fetch all keys which relate to Mr Benn in 2024). + +The Collections API gives you four functions to read, write and remove data from +a collection. + +- Use [`collections.get()`](adaptors/packages/collections-docs#collections_get) + to fetch a single value, or batch-download a range of values. +- Use + [`collections.each()`](adaptors/packages/collections-docs#collections_each) to + efficiently iterate over a range of items in a collection. Recommended for + large data sets. +- Use [`collections.set()`](adaptors/packages/collections-docs#collections_set) + to upload one or more values to a collection. `set()` is always an "upsert": + if a key already exists, it's value will be replaced by the new value +- Use + [`collections.remove()`](adaptors/packages/collections-docs#collections_remove) + to remove one or more values. + +Detailed usage examples are provided below. + +### Set some data in a Collection + +The Collection API allows you to set a JSON object (or any primitive JS value) +under a given key: + +```js +collections.set('my-collection', 'commcare-fhir-value-mappings', { + current_smoker: { + system: 'http://snomed.info/sct', + code: '77176002', + display: 'Smoker', + }, + /* ... */ +}); +``` + +You can also pass an array of items for a batch-set. When setting multiple +values, you need to set a key generator function to calculate a key for each +item, like this: + +```js +collections.set('my-favourite-footballer', value => value.id, [ + { + id: 'player01', + name: 'Lamine Yamal', + /* other patient details */ + }, + { + id: 'player02', + name: 'Aitana Bonmati', + /* other patient details */ + }, + /* More patients {}, {} */ +]); +``` + +The key generator is a function which receives each of the values in the +supplied values array as an id (so, in the example above, it gets called with +the `player01` object, then the `player02` object, and so on). For each value, +it should return a string key, under which it will be saved in the collection. + +You can use Javascript template literals to easily generate key values which +include a mixture of static and dynamic values: + +```js +collections.set( + 'my-favourite-footballer', + value => `${value.createdDate}-${value.region}-${value.name}` + $.data +), +``` + +In this example, the `createdDate`, `region` and `name` properties will be read +from each value and assembled into a key-string, separated by dashes. This +technique creates keys that are easily sorted by date. + +### Getting data from a Collection + +To retrieve multiple items from a Collection, we generally recommend using the +`each()` function. + +`each()` will stream each value individually, greatly reducing the memory +overhead of downloading a large amount of data to the client. + +```js +collections.each('my-collection', '2024*', (state, value, key) => { + console.log(value); + // No need to return state here +}); +``` + +The second argument to `each` is a query string or object. Pass a key with a +pattern, or an object including different query strings. Check the API reference +for a full listing. + +```js +collections.each( + 'my-collection', + { key: '2024*', created_after: '20240601' }, + (state, value, key) => { + console.log(value); + } +); +``` + +You can limit the amount of data you want to download with the `limit` key. If +there are returned values on the server, a `cursor` key will be written to +`state.data`. + +```js +collections + .each('my-collection', { key: '2024*', limit: 1000 }, (state, value, key) => { + console.log(value); + }) + .then(state => { + state.nextCursor = state.data.cursor; + // state.data.cursor now contains the cursor position + return state; + }); +``` + +You can fetch items individually with `get()`, which will be written to +state.data: + +```js +collections.get('my-collection', 'commcare-fhir-value-mappings').then(state => { + state.mappings = state.data; + return state; +}); +collecions.each($.inputs, state => { + const mappedString = state.mappings[state.data.diagnosis]; + state.resources ??= {}; + state.resources[state.data.id] = mappedString; + return state; +}); +``` + +You can also fetch multiple items with `get()`, which supports the same query +options as `each()`. + +Bear in mind that all the items will be loaded into memory at once. For large +datasets and structures, this may cause problems. + +When bulk-loading with `get()`, state.data will be an array of items, and +`state.data.cursor` will contain the cursor position from the server + +```js +collections.get('my-collection', '2024*').then(state => { + state.allRecords = state.data; + return state; +}); +``` + +### Remove data from a Collection + +You can remove an individual value by key: + +```js +collections.remove('my-collection', 'commcare-fhir-value-mappings'); +``` + +You can also use patterns to delete multiple values at a time: + +```js +collections.remove('my-collection', '2024*'); +``` + +## Filters, Limits & Cursors + +As well as filtering keys with patterns, you can filter by created date: + +```js +collections.each( + 'my-collection', + { key: '2024*', createdAfter: '20240601' }, + (state, value, key) => { + console.log(value); + } +); +``` + +You can use `createdBefore` and `createdAfter` dates, which must be ISO 1806 +formatted strings. The `createdBefore` timestamp will match all dates less than +or equal to (<=) the _start_ of the provided date. Conversely, `createdAfter` +will match dates greater than or equal to the _end_ of the provided date. + +By default, all matching values will be returned to you, but you can limit how +many items are returned in a single call: + +If a limit is set and there are more values waiting on the server, a `cursor` +will be written to `state.data`. You can pass this cursor back to the server in +the next query to resume from that position. + +```js +// request 10k values from the cursor position +collections.get('my-collection', { key: '*', limit: 10e3, cursor: $.cursor }); +fn(state => { + // Write the cursor (if any) back to state for next time + state.cursor = state.data.cursor; + return state; +}); +``` + +## CLI usage + +:::info + +Improved Collections support is coming to the CLI soon. + +::: + +Collections are designed for close integration with the platform app, but can be +used from the CLI too. + +You will need to: + +- Set the job to use two adaptors +- Pass a Personal Access Token +- Set the Collections endpoint + +You can get a Personal Access Token from any v2 deployment. + +Remember that a Collection must be created from the Admin page before it can be +used! + +### For a single job + +You can pass multiple adaptors from the CLI: + +```bash +openfn job.js -a collections -a http -s state.json +``` + +You'll need to set configuration on the state.json: + +```json +{ + "configuration": { + "collections_endpoint": "http://localhost:4000/collections", + "collections_token": "...paste the token from the app..." + } +} +``` + +### For a workflow + +If you're using `workflow.json`, set the token and endpoint on +`workflow.credentials`: + +```json +{ + "workflow": { + "steps": [ ... ], + "credentials": { + "collections_endpoint": "http://localhost:4000/collections", + "collections_token": "...paste the token from the app..." + } + } +} +``` + +And make sure that any steps which use collections have multiple adaptors set: + +```json +{ + "workflow": { + "steps": [ + { + "expression": "...", + "adaptors": ["@openfn/language-http", "@openfn/language-collections"] + } + ] + } +} +``` diff --git a/docs/build/collections.md b/docs/build/collections.md new file mode 100644 index 000000000000..55c23331deaa --- /dev/null +++ b/docs/build/collections.md @@ -0,0 +1,150 @@ +--- +title: Collections +sidebar_label: Collections +--- + +Collections provides a high-volume, high-performance storage solution built into +OpenFn. + +Collections is suitable for buffering, caching and aggregating data from +Webhooks, storing large mapping files, and sharing state between workflows. + +Collections can be used to store a very large number of items (in the order of +millions). + +:::caution Collections Stability + +Collections is a new feature to OpenFn, in beta release since November 2024. + +We'd love to hear your feedback on +[community.openfn.org](https://community.openfn.org/) via email at +[support@openfn.org](mailto:support@openfn.org). + +::: + +## Use Cases + +### Buffering Data + +Many OpenFn integrations are triggered through a webhook, called from another +system based on some event. For example, every time a patient is registered, a +webhook calls into OpenFn to trigger a workflow and propagate the registration +event to other systems. + +Collections can be used as a buffer for these incoming events, saving the event +data on OpenFn and then processing a batch of events at the end of the day. This +is particularly useful in high volume events, or when limits are imposed on +upstream systems. + +With Collections, you can save each incoming event onto OpenFn, then run a +Workflow on a Cron trigger to process a batch of events in one go, and send on +aggregated, filtered or transformed results to the next system. + +### Mapping Structures + +A typical use-case is data integrations is to store large mapping objects. These +objects themselves are key-value pairs which map strings from one system into +matching strings from another system. For example, mapping medical codes into +SNOMED, or mapping city codes into human-readable strings, or mappings some +input string to a DHIS2 attribute code. + +These objects are often very large and hard to maintain, and can bloat job code. + +Instead, the mappings can be saved to a GitHub repository as a JSON object, and +uploaded to a collection using the CLI. + +## Collections Basics + +Data is stored in Collections as key-value pairs, where the key is a unique +identifier for some data (like a UUID, or timestamp). The value is always a +string - although JSON objects will be automatically serialized to a string +using the Collections API. + +Keys can be fetched in bulk and filtered by _pattern_. For example, the pattern +`2024*` will match all keys which start with `2024`. Designing keys to have an +efficient sort order is critical for high-volume Collections applications. + +The example below fetches values from the `openfn-patient-registrations` +collection and saves them onto state for further processing: + +```js +collections.get('openfn-patient-registrations', '2024*').then(state => { + state.registrationsThisYear = state.data; + return state; +}); +``` + +Every key permanently saves its creation date, so as well as fetching by +key-pattern, you can also filter keys by date. This example fetches all keys +created before 30th September 2024: + +```js +collections + .get('openfn-patient-registrations', '*', { createdBefore: '2024-09-30' }) + .then(state => { + state.registrationsThisYear = state.data; + return state; + }); +``` + +`collections.get` will download all matching values into memory. For large +values or high-volume value sets, it is more efficient to use +`collections.each`, which will stream each value into memory individually and +then discard it. + +```js +collections.each( + 'my-collection', + { key: '2024*', createdAfter: '20240601' }, + (state, value, key) => { + console.log(value); + } +); +``` + +New values are uploaded to a collection through `collections.set`: + +```js +collections.set('openfn-demo', 'commcare-fhir-value-mappings', { + current_smoker: { + system: 'http://snomed.info/sct', + code: '77176002', + display: 'Smoker', + }, + /* ... */ +}); +``` + +## Managing Collections + +Collections can be created, destroyed or renamed from the Admin menu. + +![Collections Admin Page](/img/collections_admin.png) + +Before it can be used, a collection must be created. Collection names must be +unique to the deployment, so we recommend using your organisation (and maybe +project) as a prefix, ie, `openfn-demo`. + +## Using Collections + +Collections are available to all Workflows via a simple high-level interface. + +:::caution + +A Collection must be created in the admin interface before it can be used. + +::: + +The Collections API provides four basic verbs: + +- [`collections.get()`](adaptors/packages/collections-docs#collections_get) + downloads values matching a key or key pattern. +- [`collections.each()`](adaptors/packages/collections-docs#collections_each) + efficiently iterates over a range of items in a collection. +- [`collections.set()`](adaptors/packages/collections-docs#collections_set) + uploads values to a collection. +- [`collections.remove()`](adaptors/packages/collections-docs#collections_remove) + will remove values by key or key pattern. + +The Collection API is backed by a special adaptor: see the +[Collections Adaptor API](adaptors/collections) for more details. diff --git a/sidebars-main.js b/sidebars-main.js index c9f9530a81a5..73552380fb76 100644 --- a/sidebars-main.js +++ b/sidebars-main.js @@ -64,6 +64,7 @@ module.exports = { 'build/ai-assistant', 'build/paths', 'build/credentials', + 'build/collections', 'build/limits', 'build/editing-locally', 'build/working-with-branches', diff --git a/static/img/collections_admin.png b/static/img/collections_admin.png new file mode 100644 index 000000000000..d86a0dd63527 Binary files /dev/null and b/static/img/collections_admin.png differ