CLI: Support collections #819

josephjclark · 2024-11-10T19:26:09Z

It would surely be useful for the CLI to support the new Collections feature directly

A particularly strong use-case would be to upload mapping data into a collection. I have 1000 mappings in my GH repo and want to upload them to lightning for use in my workflow. I don't really want to have to create a workflow step and copy and paste the values in for this. I want to call the collections API directly.

openfn collections get/count/keys/set/delete <name> <pattern> --input  --output --token --limit --url

get would get values from a collection and return a JSON object of key/value pairs. Uses --output to write the result to disk (if unset, returns to stdout). Pass --json to parse values as JSON.
count would return the number of values which match a particular pattern. Returns to stdout only.
keys would return a list of matching keys without downloading any data (we don't even support this!)
set would accept data through a file. CSV or JSON. If JSON, it must expect and object of key value pairs, and it should probably stringify values
delete would delete keys according to the pattern and return the count of deleted keys.

Auth

We need to think carefully about auth.

The request needs to include a personal access token.

But we probably don't want to take this as a flat env var because access tokens are scoped per project, so users will have several. Which makes it a bit fiddly to submit to the CLI.

Options:

accept an env var name in the command, ie, --token MY_OPENFN_TOKEN (but surely --token $MY_OPENFN_TOKEN also works?)
If a token isn't provided, prompt the user to paste it in (annoying)
Allow tokens to be saved by the CLI (which must encrypt and write them to the repo somewhere). Now we only need to take a token once and the CLI can remember for that collection name

Implementation details

Should this use the collections adaptor under the hood? Probably not, because it behaves very differently. But we might copy some code across to handle streaming efficiently.

The text was updated successfully, but these errors were encountered:

josephjclark · 2024-12-05T17:34:37Z

A data structure discussion

tl;dr:

Should the get and set structures be identical in the CLI? Probably yes, to support mappings
Should the structure be { [key]: value } or [{ key, value }] ?
Is collections.set confusing if you can set a single value or upload an object?

In the CLI I want to use a simplified data structure. So collections get my-collection * will return an object of key value pairs, like this:

{
  items: {
    'item-1': { id: 'a' },
    'item-2': { id: 'b' },
  } ,
 count: 2,
}

I find this quite a nice and intuitive data structure. Very Javascripty. I'll maybe add a meta key for created/update a metadata.

So when setting, it makes sense that I should do collections set my-collection path/to/values.json, where values.json looks like this:

{
  'item-1': { id: 'a' },
  'item-2': { id: 'b' },
}

Again it's very intuitive. But note there's no items and no count.

So, first problem: I can't get a bunch of items, save them to disk, manually update the values in-situ, and then set it back. because the shape of the data is wrong. Well, that's unintuitive, but I don't think I ever want to do this anyway?

Not for multiple items no. But for mappings, yes. I absolutely want to fetch, edit and upsert a mappings file. Probably usually the mappings file is checked in to source control but still, this seems like a thing I would want to do.

I could to strip the wrapper from the get call, which aligns the data structures - but if I do this, I have no way to handle metadata. Does this matter? I don't know. I feel like closing off the option of metadata is a mistake.

We could change the format and download results to an array of { key, value, ...meta } objects. That aligns get and set and lets us save metadata later, and mirrors the server side data structure. But:

It's a very verbose serialisation, ugly and expensive in memory
If I download a single item, a mapping, does it still come down as a [{ key, value}] array? That's not nice at all for mappings

One benefit of that structure is that we can detect it, so that if you try and set, say, an array of keys onto a single key, we can raise a warning.

That leads into the second problem. I'm really worried that a CLI like this is ambiguous:

openfn collections set my-collection some-key path/to/values.json

If values.json contains a single value, no problem. We upload that value verbatim to the collection.

If values.json contains many key/value pairs, we'll be uploading a nested structure into the key. Ok, maybe that's stupid, but isn't it super easy to accidentally have a key name in the command, or mis ype something? Miss a dash out and an argument like --limit will be interpreted as a key.

So I think this gets flaky and unintuitive.

Solution?

I really want to optimise this for uploading and downloading single items.

That means get and set should upload and download to file verbatim. No metadata (we could log it to stdout I suppose)

If getting with a pattern, we can write a simplified structure with no wrapper. I think this is fairly intuitive.

That leaves bulk upsert - which is currently impossible in this API. We could add a command like:

openfn collections set my-collection --items path/to/values.json

Where values.json is a verbatim object of key/value pairs (the same object that would be downloaded)

I like this approach because --items is pretty declarative, explicitly plural, and non-default.

josephjclark · 2024-12-06T11:13:05Z

Should we use a standard env var for the PAT after all?

Although a user might have access to multiple projects, they'll likely only have one PAT right? And usually only use one lightning instance?

So if we defaulted OPENFN_PAT and loaded that by default, that would actually be quite reasonable. We can log the last 8 characters or whatever (like the Lightning UI does) do give a hint as to which token is being used, just for the avoidance of doubt. Maybe debug level.

josephjclark · 2024-12-06T11:34:30Z

What about remove --dry-run, which will basically just GET a list of keys which match the pattern and remove what will be removed.

So you can do remove --dry-run to test a key pattern

taylordowns2000 added this to v2 Nov 10, 2024

github-project-automation bot moved this to New Issues in v2 Nov 10, 2024

josephjclark mentioned this issue Nov 11, 2024

Collections Docs OpenFn/docs#599

Merged

7 tasks

josephjclark mentioned this issue Dec 6, 2024

CLI: Collections support #839

Merged

11 tasks

josephjclark closed this as completed in #839 Dec 11, 2024

github-project-automation bot moved this from New Issues to Done in v2 Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: Support collections #819

CLI: Support collections #819

josephjclark commented Nov 10, 2024

josephjclark commented Dec 5, 2024 •

edited

Loading

josephjclark commented Dec 6, 2024

josephjclark commented Dec 6, 2024

CLI: Support collections #819

CLI: Support collections #819

Comments

josephjclark commented Nov 10, 2024

Auth

Implementation details

josephjclark commented Dec 5, 2024 • edited Loading

Solution?

josephjclark commented Dec 6, 2024

josephjclark commented Dec 6, 2024

josephjclark commented Dec 5, 2024 •

edited

Loading