Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up Prod/Stg/Sandbox environment setup #342

Open
5 of 9 tasks
maxachis opened this issue Jun 27, 2024 · 5 comments
Open
5 of 9 tasks

Set up Prod/Stg/Sandbox environment setup #342

maxachis opened this issue Jun 27, 2024 · 5 comments
Assignees
Labels
database devops documentation Improvements or additions to documentation Github Action Involves the creation or modification of Github Actions v2 For v2 release

Comments

@maxachis
Copy link
Contributor

maxachis commented Jun 27, 2024

Discussed previously in #340, here is the implementation!

TODO

  • Create new Sandbox database in Digital Ocean, with same specs as pdap-db-dev (which unfortunately cannot be easily renamed).
  • Update prod-to-dev-migration repository to simultaneously construct the Stage and Sandbox databases, with Stage using the data from Production, and Sandbox using the Schema from Production.
  • Update automation.pdap.io with environmental variables necessary to ensure proper functionality in refresh, modify configuration as needed
  • Rename Prod-To-Dev-Migration job in automation.pdap.io to reflect its new behavior.
  • Develop dummy data rows to populate sandbox database, modify prod-to-dev-migration repository to automatically add these to the sandbox database on migration. This website might be helpful.
  • Add documentation in Notion describing how everything works.

Optional/Debatable TODO

  • To maintain clarity, destroy pdap-db-dev and build new database pdap-db-stg, redirecting all stage connections to this database.
  • Rename variables in run_pytest.yaml (as well as in associated Python and Github configuration components) to accurately reflect renaming. Not necessary, but will save us confusion later on.
  • Create new issue for providing instructions and resources for building a local version of the database for personal testing purposes, and for populating with dummy data.
@maxachis maxachis self-assigned this Jun 27, 2024
@maxachis maxachis added documentation Improvements or additions to documentation database v2 For v2 release Github Action Involves the creation or modification of Github Actions devops labels Jun 27, 2024
@maxachis
Copy link
Contributor Author

maxachis commented Jun 28, 2024

automation.pdap.io now has two builds: One for Sandbox, and one for Stage. They should occur around the same time, but this allows us to be more flexible if we need to in the future.

Next up is the question of dummy data.

In some cases, we may not need any dummy data and can simply use data from the tables -- tables such as zip_codes and state_names don't contain any sensitive information, so we could just use those.

Then there are some cases where we simply don't need to include all that data -- for example, quick_search_query_logs, as previously discussed, has an oversized amount of content that we don't need and which will slow things down.

We also want to be mindful that our schemas will change, so the more dummy data we believe we should possess, the more dummy data to modify if we adjust our schemas.

Additionally, some newly-created tables, such as requests_v2, do not have any data at all, and adding some dummy data will help us test it out ahead of bringing it into production.

For now, I think we can get away with a handful of dummy data in sensitive tables such as users, access_tokens, session_tokens, and data_requests, empty tables such as requests_v2, and importing en-masse from other tables, while ignoring quick_search_query_logs.

@maxachis
Copy link
Contributor Author

maxachis commented Jun 28, 2024

Scripts modified so that sandbox database has non-sensitive data added from production. Next up is dummy data.

Dummy Data TODO

  • Create dummy_data folder in repository, with empty csvs for each table to fill with dummy data
  • Modify sandbox script to load dummy data from csvs
  • Fill csvs with at least one row each to confirm functionality
  • Add Python setup info to ensure proper functionality with python script
  • Confirm functionality works in separate branch when called from automation.pdap.io
  • Merge into main

@maxachis
Copy link
Contributor Author

Additionally, in the course of developing this logic, I came to the conclusion that Python would be preferable to shell scripts, and made an issue accordingly.

@maxachis
Copy link
Contributor Author

@josh-chamberlain I'll need to be made a member of the Notion workspace to add a "Testing" page with information on the Stage and Sandbox databases ⛑️

@josh-chamberlain
Copy link
Contributor

Nice work! I like your optionals, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database devops documentation Improvements or additions to documentation Github Action Involves the creation or modification of Github Actions v2 For v2 release
Projects
None yet
Development

No branches or pull requests

2 participants