Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine protocol for developing/testing database schema changes #340

Open
maxachis opened this issue Jun 26, 2024 · 7 comments
Open

Determine protocol for developing/testing database schema changes #340

maxachis opened this issue Jun 26, 2024 · 7 comments
Assignees
Labels
database documentation Improvements or additions to documentation tests Adds or updates tests

Comments

@maxachis
Copy link
Contributor

maxachis commented Jun 26, 2024

So now that we've set up a development database, we face a few new challenges related to how we set up and test database schema changes.

Currently, the development database is refreshed from production once a day, using the code within the prod-to-dev migration repository. Within that repository is a script called dev_scripts.sql which includes a set of schema changes to be applied to the development database after the prod-to-dev refresh. So the development database is refreshed from prod, and then this script is called making the additional modifications to the schema. Any recent schema changes to the dev DB which are not in this script are effectively wiped during the next refresh.

What should we do if we have multiple people developing and testing out database changes, and making code which runs off of those database changes? Part of the complication is that the automated test scripts we have in the Github repository run against the development database. If developer A makes changes to the dev database schema, and developer B makes a pull request, B's PR will be tested against the modified dev database, and some tests may fail because of schema changes B doesn't know about.

There's a few options, not all of them mutually exclusive:

  1. YOLO option: Give all the developers ability to modify the developer database at will and have them sort it out themselves, even if it causes tests to fail. Chaos reigns 🦊.
  2. All changes to the development database schema must be done through dev_scripts.sql, with a PR made for each.
  3. Rename the current development database to stage, set up a new development/sandbox database that refreshes from development, and give developers privileges to modify that to their hearts content. Tests are still run against a "clean" database environment, but developers can play around with schema changes not yet ready for prime time. Some YOLO persists simply because schema changes of different developers can conflict, but the tests are unaffected.
  4. Have developers set up their own local PostgreSQL servers, and give them a script to copy the dev database (or just the schema) to their local PostgreSQL server at will. Developers can play around in the environment to their heart's content, but they will need to manage their own infrastructure.
  5. Some secret additional option I don't know about yet ✨.

So we need to decide on what is best, and also develop Gitbook documentation detailing how that works.

@maxachis maxachis added documentation Improvements or additions to documentation tests Adds or updates tests database labels Jun 26, 2024
@maxachis
Copy link
Contributor Author

An additional wrinkle is how to handle testing a schema change itself. The database refresh scripts and the web app containing the tests are separate repositories (which I argued for previously), and the dev database, at the moment, only refreshes once a day.

One option is to tie any change to the main repository of the database refresh directory to a rerun of the prod-to-dev logic, and then immediately run the tests after the merge, rolling back the PR if any tests fail. That could even be automated.

@EvilDrPurple
Copy link

Personally I like the ideas of 2 and 4. Seems like a good compromise over 🦊. Devs will be able to test locally to make sure their changes don't implode the whole system

@maxachis
Copy link
Contributor Author

@EvilDrPurple The good news is that in that case, the existing dump_prod_to_dev.sh can be reused with at most minor modifications (if that) to dump dev to a local database.

The tricky part will be setting up the local database so that the shell script will work with it. That'll probably take some trials.

Additionally, the presence of the data in the database is a concern. There is user info, contact information and unencrypted API keys, among other things, which we may not want to have available to every developer working on this level. The workaround to that would be ensuring that only the schema, and not the data, is provided, although that will make things annoying to people testing their changes, as they'll need to provide fake data.

@EvilDrPurple
Copy link

Additionally, the presence of the data in the database is a concern. There is user info, contact information and unencrypted API keys, among other things, which we may not want to have available to every developer working on this level. The workaround to that would be ensuring that only the schema, and not the data, is provided, although that will make things annoying to people testing their changes, as they'll need to provide fake data.

@maxachis I see, is there any way to provide data just from non-sensitive tables or even provide fake filler data instead of the actual thing?

@maxachis
Copy link
Contributor Author

Additionally, the presence of the data in the database is a concern. There is user info, contact information and unencrypted API keys, among other things, which we may not want to have available to every developer working on this level. The workaround to that would be ensuring that only the schema, and not the data, is provided, although that will make things annoying to people testing their changes, as they'll need to provide fake data.

@maxachis I see, is there any way to provide data just from non-sensitive tables or even provide fake filler data instead of the actual thing?

@EvilDrPurple Oh absolutely! It'll just take some time to figure out how to do that and set up the implementation. Ideally, my goal is to have you and others be able to work on developing database-impacting changes right away, but I want to make sure the protocol is sustainable and @josh-chamberlain - approved.

@josh-chamberlain
Copy link
Contributor

josh-chamberlain commented Jun 26, 2024

@maxachis @EvilDrPurple

  1. 🦊 is pretty fun and would work great IRL, but sadly we cannot all just shout at each other
  2. schema source of truth updates happen via PR. sounds great to me!
  3. I like this because it is a great gauge of how well something will deploy to production, and it makes easier for others to test: "Go look at staging!".
    • We should absolutely rename dev to sandbox for clarity, if we still want a sandbox type place to test things, which...do we? We would need a place to shout "who broke sandbox?" which could replicate some of the fun of # 1
  4. is not a bad option to give people, and it seems like the pathway to setting that up is just providing a script. I think we should let them copy production, though, if it's just local anyways.
    • local database: not too many rows of dummy data for most any table should be sufficient. 10? 100? People can add their own data if they need to test specific cases, and maybe it would grow organically. You can always push something you're not positive about to sandbox
    • API key security and recoverability #335
  5. have you not been wearing that tin foil hat I sent? you should have received a message...

in any case, if we all just try to publish our branches and make draft PRs, claim issues (all seem to be happening), there shouldn't be any unrecoverable conflicts.

@maxachis
Copy link
Contributor Author

@josh-chamberlain

  1. We can setup a channel in the discord to discuss database stuff. Renaming dev to sandbox can be done, and in terms of breaking sandbox, we can set up a way to reset the sandbox on demand if need be. I would argue that if we do sandbox, we should include dummy data there just to be safe -- if staging will be kept restricted for more established testing, I'd argue it's fine to keep that data. But since sandbox will be more open to even volunteer developers who we may not have a history with, better to be safe.
  2. As far as copying from production, I'd argue that eventually we'd want to do that when we don't have the v2 bottleneck, but since right now staging will contain all the v2 stuff, it may make more sense to just keep that as is.
  3. I did, but the Tiny Joe Biden that sits on my shoulder told me not to take it. Can't disobey an order from my president 🦅

I will begin work on setting up an additional database and figuring out dummy data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database documentation Improvements or additions to documentation tests Adds or updates tests
Projects
None yet
Development

No branches or pull requests

3 participants