Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections Docs #599

Merged
merged 5 commits into from
Nov 12, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions adaptors/collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
---
title: Collections Adaptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can somehow tag collections and common as special OpenFn adaptors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. Collections is definitely special. Common is only sort of special 🤔

So I guess to answer this I'd have to ask what we mean by "special" and what exactly we want to flag

---

## Collections Overview

The Collections API is a key/value storage solution. It is designed for high
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make clear that this is a data store in OpenFn... not something external (like all other adaptors).

performance over a large volume of data.

Use-cases include:

- Storing mapping objects for use in workflows
- Buffering and aggregating high volumes of incoming data
- Caching and sharing state between workflows

A Collection is bound to a project. Collections can only be accessed with a
token associated with that project. When running on the app, a workflow is
automatically granted access to all collections on the same project. When
running in the CLI, a Personal Access Token can be used (generated from the app
at /profile/tokens).

## The Collections Adaptor

The Collections adaptor is a special adaptor. Uniquely, the Collections adaptor
is designed to be run _alongside_ other adaptors, and is injected for you by the
platform.

This makes the Collections API available to every step in a workflow, regardless
of which adaptor it is using.

## Usage Guide

### Set some data in a collection

The Collection API allows you to set a JSON object (or any primitive JS value)
under a given key:

You can also pass an array of items for a batch-set.

### Getting data from a collection

To retrieving multiple items from a Collection, we recommend using the `each()`
function.

`each()` will stream each value individually, greatly reducing the memory
overhead of downloading a large amount of data to the client.

```js
each('my-collection', '2024*', (state, value, key) => {
console.log(value);
// No need to return state here
});
```

The second argument to `each` is a query string or object. Pass a key with a
pattern, or an object including different query strings. Check the API reference
for a full listing.

```js
each(
'my-collection',
{ key: '2024*', created_after: '20240601' },
(state, value, key) => {
console.log(value);
}
);
```

You can limit the amount of data you want to download with the `limit` key. If
there are returned values on the server, a `cursor` key will be written to
`state.data`.

```js
each('my-collection', { key: '2024*', limit: 1000 }, (state, value, key) => {
console.log(value);
}).then(state => {
state.nextCursor = state.data.cursor;
// state.data.cursor now contains the cursor position
return state;
});
```

You can fetch items individually with `get()`, which will be written to
state.data

```js
collections.get('my-collection', 'commcare-fhir-value-mappings').then(state => {
state.mappings = state.data;
return state;
});
each($.inputs, state => {
const mappedString = state.mappings[state.data.diagnosis];
state.resources ??= {};
state.resources[state.data.id] = mappedString;
return state;
});
```

You can also fetch multiple items with `get()`, which supports the same query
options as `each()`.

Bear in mind that all the items will be loaded into memory at once. For large
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... will ping you next week to help me understand tradeoffs of using get() and each() for different use cases so I can better understand when we might start bumping into limits, and to know which is best for different scenarios... we can then beef up these docs with some examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't ping me - let me fix it here!

datasets and structures, this may cause problems.

When bulk-loading with `get()`, state.data will be an array of items, and
`state.data.cursor` will contain the cursor position from the server

```js
collections.get('my-collection', '2024*').then(state => {
state.allRecords = state.data;
return state;
});
```

### Remove data from a collection

You can remove an individual item by key:

```js
collections.remove('my-collection', 'commcare-fhir-value-mappings');
```

You can also use the same query options as `get()` and `each()` to bulk delete:

```js
collections.remove('my-collection', { createdBefore: '20240601' });
```

## Collection Administration

Collections must be created in the platform Admin page before they can be used.

Collections can be removed from the Admin page.

## CLI usage

Collections are designed for close integration with the platform app, but can be
used from the CLI too.

You will need to:

- Set the job to use two adaptors
- Pass a Personal Access Token
- Set the Collections endpoint

You can get a Personal Access Token from any v2 deployment.

Remember that a Collection must be created from the Admin page before it can be
used!

### For a single job

You can pass multiple adaptors from the CLI:

```bash
openfn job.js -a collections -a http -s state.json
```

You'll need to set configuration on the state.json:

```json
{
"configuration": {
"collections_endpoint": "http://localhost:4000/collections",
"collections_token": "...paste the token from the app..."
}
}
```

### For a workflow

If you're using `workflow.json`, set the token and endpoint on
`workflow.credentials`:

```json
{
"workflow": {
"steps": [ ... ],
"credentials": {
"collections_endpoint": "http://localhost:4000/collections",
"collections_token": "...paste the token from the app..."
}
}
}
```

And make sure that any steps which use collections have multiple adaptors set:

```json
{
"workflow": {
"steps": [
{
"expression": "...",
"adaptors": ["@openfn/language-http", "@openfn/language-collections"]
}
]
}
}
```