Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to create or update records in bulk #2844

Open
Tracked by #3143 ...
seancolsen opened this issue Apr 25, 2023 · 7 comments · May be fixed by #3222
Open
Tracked by #3143 ...

API to create or update records in bulk #2844

seancolsen opened this issue Apr 25, 2023 · 7 comments · May be fixed by #3222
Assignees
Labels
ready Ready for implementation restricted: maintainers Only maintainers can resolve this issue type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL
Milestone

Comments

@seancolsen
Copy link
Contributor

Limitations of the current API

  • Creating and modifying records requires one request per record

Use cases

  • Pasting data into cells requires modifying existing records and also (according to the spec) creating new records too (when the pasted data extends beyond the viewable records).
  • We may eventually want to implement other features like setting multiple cell values to null, or doing something akin to the "fill" feature that spreadsheets offer. This would require modifying records in bulk.

Similar work

  • Add bulk delete endpoint #2641 added a new endpoint to delete records in bulk

  • Add existing_import API #1442 added a a new endpoint to create records in bulk

    In Matrix @silentninja said:

    I don't think you can use the existing_import API for pasting the data into cells because the existing_import API creates a temporary Table from a datafile and then copies the data from the temporary Table to the Table the data has to be imported into, meanwhile the case of pasting data into cells is a much simpler task. So creating a new temporary Table and copying data from that temporary table might be an overkill for the copy-paste feature

A "Paste" example

  1. User copies these eight cells:

  2. User pastes the copied data into the sheet at the following starting position:

  3. The front end uses the API to:

    • Update records 249 and 250, modifying First Name and Last Name.
    • Create two new records by supplying values for First Name and Last Name.
  4. The API responds with the full records for all the records modified and created, allowing the front end to display the new data.

Thoughts on API design

  • I want to leave this ticket somewhat open-ended to start, in case the backend team would like to propose an API design.

  • It would be nice to perform these changes within a single transaction. From a UX perspective, if something goes wrong, no changes should be made. That would mean creating one API to handle both updating records and creating records.

  • How does the API behave when we tell it to update a record which no longer exists?

    Pretend we're following the "Paste" example above but in between steps 1. and 2. another user deletes record 249.

    • One possibility would be to create a new record with id 249.
    • Another possibility would be to fail. I think I prefer this behavior.
  • For the copy-paste use case at least, we can rely on our our data being rectangular. That is, we don't need to update one field in record 249 and a different field in record 250 — we need to update all the same fields in all records. I think it's fine to bake this assumption into the API, especially since I predict it may be necessary for a performant implementation. For example, we'd would want to avoid a naive implementation which makes one UPDATE query per row.

    Assuming we do bake this "rectangular" assumption into the API behavior, it may be worth considering the ways in which the API design convey that behavior. For example, we might be inclined to design the request body like this:

    [
      { "1": 249, "2": null,     "3": "Lloyd"     },
      { "1": 250, "2": "Shelby", "3": "Shepherd"  },
      {           "2": "Robert", "3": "Mccormick" },
      {           "2": "Dawn",   "3": null        }
    ]

    But with that design, an API consumer could submit a rather strange request to update two records like this:

    [
      { "1": 249,               "3": "Lloyd" },
      { "1": 250, "2": "Shelby"              }
    ]

    Because of the potential for requests like the the above, we'd need to clearly communicate the behavior of the API under these circumstances. For record 249, column 2, would the API leave the value untouched? Or would it set the value to NULL? Or maybe the request would fail?

    Just as food for brainstorming, a request schema like this might go further towards communicating this "rectangular" requirement to API consumers. I don't know.

    {
      "columns": [1, 2, 3],
      "data": [
        [ 249, null,     "Lloyd"      ],
        [ 250, "Shelby", "Shepherd"   ],
        [ null, "Robert", "Mccormick" ],
        [ null, "Dawn",   null        ]
      ]
    }
  • Unlike our proper records endpoint, the front end will not have any good way to display cell-level errors for this paste feature, or for the other features I'm imagining we'd build with this API. For that reason, the requirements around the error response are more loose. When the request fails due to some data problem (e.g. constraint violation, etc) the front end will display a toast message. The API should provide enough info in the error message to help the user fix their problem, but it certainly doesn't need to list all the error, or even be super specific about which cell caused the error.

@seancolsen seancolsen added status: draft type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL restricted: maintainers Only maintainers can resolve this issue labels Apr 25, 2023
@seancolsen seancolsen added this to the Backlog milestone Apr 25, 2023
@seancolsen
Copy link
Contributor Author

seancolsen commented Apr 25, 2023

@silentninja I need this API for the "paste" feature that I'm implementing as part of the Usability Improvements project, for which you are a helper. Would you be able to help me out with this? We first need to agree on the API design. Would you be able to propose a design which specifies the endpoint, HTTP verb, request schema, and response schema? I'm not sure who else should be involved in approving the API design, but it might be worth soliciting feedback more broadly once we have a clearer proposal. After we have an approved design, I'll also need a backend engineer to take on the implementation. Might that be you as well?

@silentninja
Copy link
Contributor

silentninja commented Apr 27, 2023

"past" feature

Is this meant to be "paste feature"?

I can work on the API design and the implementation. I will start working on this once I finish up tasks related v0.1.2 milestone.

@seancolsen
Copy link
Contributor Author

Is this meant to be "paste feature"?

Yes. Thank you. Fixed.

I can work on the API design and the implementation.

Excellent. Thanks!

@silentninja
Copy link
Contributor

silentninja commented Jun 1, 2023

Because of the potential for requests like the the above, we'd need to clearly communicate the behavior of the API under these circumstances. For record 249, column 2, would the API leave the value untouched? Or would it set the value to NULL? Or maybe the request would fail?

The request will FAIL. Being strict with the request body is much easier and safer than assuming that the user knows about the API behavior.

How does the API behave when we tell it to update a record which no longer exists?

Pretend we're following the "Paste" example above but in between steps 1. and 2. another user deletes record 249.

One possibility would be to create a new record with id 249.
Another possibility would be to fail. I think I prefer this behavior.

The "Failing Behaviour" is hard to implement in SQL. The "Record Creation" behavior is quite easy to implement. The steps for the simple implementation of the UPSERT operation would look like

  • Insert the set of values along with the id field
  • If a row exists with the correct id value, update the other values for that row
  • If a row does not exist, set the id field value to the next SEQUENCE and create a new record.

This could be accomplished easily using the INSERT ... ON CONFLICT Postgres statement. I prefer to go with this implementation.

In order to accomplish the "Failing Behaviour" we would have to

  • Find the row with the id value and update the values if it exists
  • If the row does not exist and if the id value is given. Throw an error
  • If the row does not exist and if the id value is not given. set the id field value to the next SEQUENCE and Create a new record.

Postgres does not natively support this implementation. So we would have to come up with a very complicated logic that is prone to concurrency issues mentioned in the various threads in the POSTGRES DOCS. I would prefer to avoid going in that direction.

@seancolsen
Copy link
Contributor Author

Your plan looks good to me, @silentninja

@silentninja silentninja added ready Ready for implementation and removed status: draft labels Jun 26, 2023
@dmos62 dmos62 assigned dmos62 and unassigned silentninja Sep 19, 2023
@dmos62 dmos62 added status: started and removed ready Ready for implementation labels Sep 19, 2023
@seancolsen seancolsen linked a pull request Oct 25, 2023 that will close this issue
7 tasks
@seancolsen seancolsen modified the milestones: Backlog, High priority Nov 1, 2023
@seancolsen
Copy link
Contributor Author

seancolsen commented Nov 1, 2023

Moving to "High Priority" milestone since this work is 80% complete but Dom will no longer be working on it. I'd like to finish this up soon so that we can make use of this existing work before accumulating too many merge conflicts.

@seancolsen
Copy link
Contributor Author

@mathemancer I'm assigning this to you so that we can get this finished up soon. It can wait until after the 0.1.4 release.

@seancolsen seancolsen added ready Ready for implementation and removed status: started labels Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Ready for implementation restricted: maintainers Only maintainers can resolve this issue type: enhancement New feature or request work: backend Related to Python, Django, and simple SQL
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants