Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate dockerfiles #462

Open
wants to merge 43 commits into
base: develop
Choose a base branch
from
Open

Conversation

vasconsaurus
Copy link
Contributor

@vasconsaurus vasconsaurus commented Aug 7, 2024

Description

trying to find common ground between dockerfiles

References: CV2-5020, CV2-6202

How has this been tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can verify the changes. Please describe whether or not you implemented automated tests.

Things to pay attention to during code review

Please describe parts of the change that require extra attention during code review, for example:

  • File FFFF, line LL: This refactoring does this and this. Is it consistent with how it’s implemented elsewhere?
  • Etc.

Checklist

  • I have performed a self-review of my own code
  • I have added unit and feature tests, if the PR implements a new feature or otherwise would benefit from additional testing
  • I have added regression tests, if the PR fixes a bug
  • I have added logging, exception reporting, and custom tracing with any additional information required for debugging
  • I considered secure coding practices when writing this code. Any security concerns are noted above.
  • I have commented my code in hard-to-understand areas, if any
  • I have made needed changes to the README
  • My changes generate no new warnings
  • If I added a third party module, I included a rationale for doing so and followed our current guidelines

Sorry, something went wrong.

@vasconsaurus vasconsaurus force-pushed the 4921-consolidate-dockerfiles branch 2 times, most recently from cc37481 to 937e638 Compare August 14, 2024 16:54
@vasconsaurus vasconsaurus force-pushed the 4921-consolidate-dockerfiles branch from 937e638 to 0af6f03 Compare August 21, 2024 12:44
@vasconsaurus vasconsaurus force-pushed the 4921-consolidate-dockerfiles branch 5 times, most recently from ea40476 to 08c5bdf Compare August 30, 2024 18:49
@vasconsaurus vasconsaurus force-pushed the 4921-consolidate-dockerfiles branch from 5f833c2 to 97cfe9d Compare January 30, 2025 18:52
@vasconsaurus vasconsaurus changed the title WIP: Consolidate dockerfiles Consolidate dockerfiles Feb 26, 2025
@vasconsaurus vasconsaurus marked this pull request as ready for review February 26, 2025 19:55
@vasconsaurus vasconsaurus force-pushed the 4921-consolidate-dockerfiles branch from 9530bac to a17535f Compare February 26, 2025 19:57
Copy link
Contributor

@caiosba caiosba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, I've already reviewed this (and I worked on the Code Climate part myself). Please make sure to run integration tests against this Pender branch before merging.

USER ${APP}

EXPOSE 3200
# EXPOSE 8000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

@vasconsaurus vasconsaurus force-pushed the 4921-consolidate-dockerfiles branch 2 times, most recently from a3f918a to 43a75a3 Compare March 6, 2025 18:55
trying to find common ground between dockerfiles

Removed/changed – environment variables:
- MIN_INSTANCES=4 - Seems to have been used for passenger.
This is hardcoded in docker-entrypoint.sh (old start.sh) for puma
- MAX_POOL_SIZE=4 – Same as above.
- TERM=xterm – Seems to have been used with phantomjs
- PRODUCT=pender – We are unsure. We are keeping only APP.
- SERVER_PORT=8000 – We are moving this out.
– DEPLOYUSER=pender – We are keeping only APP.
- DEPLOYDIR=/app/pender – Change to ARG DIRPATH (it's used for other environments as well)

Removed/Changed
- RUN chown -R ${DEPLOYUSER}:www-data ${DEPLOYDIR}: Seems to have been used
when we were deploying with nginx.
- RUN echo "gem: --no-rdoc --no-ri" > ~/.gemrc:  Deprecated.
- MAINTAINER sysops@meedan.com: Deprecated.
Talked to Caio:
- He thinks we can hardcode this in config
- We also don't think profile.rb is necessary anymore
Now both docker build + run and docker compose build + compose up work
Ran the tests and only one crowdtangle test was failing.
graphicsmagick:
- we seem to have added `imagemagick` when we were using `chromeshot`:
[here](4f97370).
- and then to fix tests(?) we updated to `graphicsmagick`:
[here](2813f7e)

inotify-tools:
- was removed:
[commit](08746f8)

python:
- looks like it might have been used for youtube-dl:
[commit](bde6bb4)
also renamed files
removed environment variables not used during build time
add needed environment variables to docker-compose
Now we have one entrypoint file:
- pender local and deployed
- pender-background local and deployed
seems like it shouldn't have been in the production folder
and doesn't seem to be used anywhere
dmou and others added 6 commits March 13, 2025 09:09
* temp remove user cmd

* run tests for this branch

* move workdir

* add a chown

* add debug statements to setup-parallel

* move useradd

* test moving bins to opt/

* more debug msg

* absolute paths

* don't add volumes

* fix docker-entrypoint.sh with DEPLOY_ENV set to test

* print env

* explicitly pass deploy env in ci

* rm env

* back to detached mode

* return to local deploy env

* use tmp dir elsewhere

* delete migration creating default directories

* set default log path to use /var/log

* copy cookies.txt in ci-test-pr workflow

* use new log paths config

* different logpath syntax

* change more paths

* update paths

* attempt to copy schema.rb to /opt/db

* forcibly create /opt/db?

* fix local filepath

* fix mkdir

* fix mkdir

* actually do this in the script

* check schema.rb in

* explicit mkdir /opt/db

* maybe just use /opt?

* mkdir as root??

* add schema.rb

* try new docker test

* comment out path mods

* add chmod on schema.rb

* add debug msg

* chmod a+w

* ok the other file

* only use docker-test.yml

* go back to lograge logger

* use deploy_env override

* log debug mode

* try with rake routes

* fix workflow syntax

* fix workflow syntax

* run docker logs

* run docker logs again

* run docker logs again

* run test no parallel

* add docker inspect

* inspect pender-pender-1

* update syntax

* reset puma dirpath?

* change storage path??

* modify chmod of all files

* try again with parallel tests

* mkdir coverage first

* chmod coverage folder

* add empty coverage dir

* rm otel-collector from docker-test.yml

* sort .gitignore

* remove debug messages from ci test

* run workflow with continue-on-error for tests

* write test ssm to separate file

* use pg_isready and update readme

* create a .env.test in the CI environment

* add build:. as backup for now

* do not use an internal docker network

* add chown flags to copy commands in dockerfile

* try to fix codeclimate ci step

* retry container cp

* docker cp with absolute paths

* change relative path of cc-test-reporter

* run go mod init

* actually cd to go module dir

* run cc-test-reporter without go modules

* check go env

* set proper path of resultset

* put flags at end

* Fixing issues with code coverage report generation and upload to Code Climate (#489)

Fixes the steps that generate and upload the code coverage report to Code Climate.

The main issue was that previously, we ran the tests, prepared the coverage report, and uploaded it all within the Docker container. Now, however, the tests run inside the Docker container, while the coverage report is prepared outside of it, leading to a mismatch in file paths. The solution I implemented was to use sed to replace the paths, changing /app/pender to /home/runner/work/pender. I also needed to copy codeclimate.json to the coverage directory and set the $CC_TEST_REPORTER_ID environment variable.

Reference: CV2-5020.

* remove .env before recreating

* remove references to .env.test

* (1) Using latest Code Climate binary instead of pulling random unexistent branch from GitHub & (2) Fixing error `Error: strconv.Atoi: parsing "2025-02-13T09:46:37-05:00": invalid syntax`.

* address PR comments

---------

Co-authored-by: Caio Almeida <117518+caiosba@users.noreply.github.com>
@vasconsaurus vasconsaurus force-pushed the 4921-consolidate-dockerfiles branch from 43a75a3 to 564ccec Compare March 13, 2025 12:09
vasconsaurus and others added 23 commits March 19, 2025 14:20
Context
   - Sometimes ArchiveOrg will send a HTML response:
       - which are usually issues on their side like: “429 Too Many Requests” and “Item Not Avaiable”,
       - but also caused an error on our side, since we always expected a JSON.
   - We have two jobs that run for ArchiveOrg, both make requests, so this happens at different places/time, but need very similar error handling.
   - We also wanted to take this opportunity to try to centralize more our error handling.

How
   - Handling HTML errors
       - I extracted a method json_parse we can use for our both requests. It rescues JSON::ParserError, and re-raises it as a more specific error or as itself.
       - Then handle_archiving_exceptions deals with it.
   - Other errors
       - To make this more cohesive, I updated this to have the same behavior as when we are dealing with HTML errors,
       - so we also re-raise the errors and handle_archiving_exceptions deals with it.
   - On handle_archiving_exceptions:
       - We had been dealing with ArchiveOrg errors directly in its module, but even if they are more ArchiveOrg related, I think it makes sense to centralize everything in handle_archiving_exceptions.
   - I did some work before where we moved error handling inside get_archive_org_status
       - 28941ec
       - I think this might not be needed,
       - but even if it is, I think it might make more sense for us to wrap this job with handle_archiving_exceptions?
       - It would be easier for us to avoid duplication.

References: CV2-4482
PR 533
…ilable" (#534)

This causes issues with similarity matching.

References: CV2-6267
trying to find common ground between dockerfiles

Removed/changed – environment variables:
- MIN_INSTANCES=4 - Seems to have been used for passenger.
This is hardcoded in docker-entrypoint.sh (old start.sh) for puma
- MAX_POOL_SIZE=4 – Same as above.
- TERM=xterm – Seems to have been used with phantomjs
- PRODUCT=pender – We are unsure. We are keeping only APP.
- SERVER_PORT=8000 – We are moving this out.
– DEPLOYUSER=pender – We are keeping only APP.
- DEPLOYDIR=/app/pender – Change to ARG DIRPATH (it's used for other environments as well)

Removed/Changed
- RUN chown -R ${DEPLOYUSER}:www-data ${DEPLOYDIR}: Seems to have been used
when we were deploying with nginx.
- RUN echo "gem: --no-rdoc --no-ri" > ~/.gemrc:  Deprecated.
- MAINTAINER sysops@meedan.com: Deprecated.
Talked to Caio:
- He thinks we can hardcode this in config
- We also don't think profile.rb is necessary anymore
Now both docker build + run and docker compose build + compose up work
Ran the tests and only one crowdtangle test was failing.
graphicsmagick:
- we seem to have added `imagemagick` when we were using `chromeshot`:
[here](4f97370).
- and then to fix tests(?) we updated to `graphicsmagick`:
[here](2813f7e)

inotify-tools:
- was removed:
[commit](08746f8)

python:
- looks like it might have been used for youtube-dl:
[commit](bde6bb4)
also renamed files
removed environment variables not used during build time
add needed environment variables to docker-compose
Now we have one entrypoint file:
- pender local and deployed
- pender-background local and deployed
seems like it shouldn't have been in the production folder
and doesn't seem to be used anywhere
* temp remove user cmd

* run tests for this branch

* move workdir

* add a chown

* add debug statements to setup-parallel

* move useradd

* test moving bins to opt/

* more debug msg

* absolute paths

* don't add volumes

* fix docker-entrypoint.sh with DEPLOY_ENV set to test

* print env

* explicitly pass deploy env in ci

* rm env

* back to detached mode

* return to local deploy env

* use tmp dir elsewhere

* delete migration creating default directories

* set default log path to use /var/log

* copy cookies.txt in ci-test-pr workflow

* use new log paths config

* different logpath syntax

* change more paths

* update paths

* attempt to copy schema.rb to /opt/db

* forcibly create /opt/db?

* fix local filepath

* fix mkdir

* fix mkdir

* actually do this in the script

* check schema.rb in

* explicit mkdir /opt/db

* maybe just use /opt?

* mkdir as root??

* add schema.rb

* try new docker test

* comment out path mods

* add chmod on schema.rb

* add debug msg

* chmod a+w

* ok the other file

* only use docker-test.yml

* go back to lograge logger

* use deploy_env override

* log debug mode

* try with rake routes

* fix workflow syntax

* fix workflow syntax

* run docker logs

* run docker logs again

* run docker logs again

* run test no parallel

* add docker inspect

* inspect pender-pender-1

* update syntax

* reset puma dirpath?

* change storage path??

* modify chmod of all files

* try again with parallel tests

* mkdir coverage first

* chmod coverage folder

* add empty coverage dir

* rm otel-collector from docker-test.yml

* sort .gitignore

* remove debug messages from ci test

* run workflow with continue-on-error for tests

* write test ssm to separate file

* use pg_isready and update readme

* create a .env.test in the CI environment

* add build:. as backup for now

* do not use an internal docker network

* add chown flags to copy commands in dockerfile

* try to fix codeclimate ci step

* retry container cp

* docker cp with absolute paths

* change relative path of cc-test-reporter

* run go mod init

* actually cd to go module dir

* run cc-test-reporter without go modules

* check go env

* set proper path of resultset

* put flags at end

* Fixing issues with code coverage report generation and upload to Code Climate (#489)

Fixes the steps that generate and upload the code coverage report to Code Climate.

The main issue was that previously, we ran the tests, prepared the coverage report, and uploaded it all within the Docker container. Now, however, the tests run inside the Docker container, while the coverage report is prepared outside of it, leading to a mismatch in file paths. The solution I implemented was to use sed to replace the paths, changing /app/pender to /home/runner/work/pender. I also needed to copy codeclimate.json to the coverage directory and set the $CC_TEST_REPORTER_ID environment variable.

Reference: CV2-5020.

* remove .env before recreating

* remove references to .env.test

* (1) Using latest Code Climate binary instead of pulling random unexistent branch from GitHub & (2) Fixing error `Error: strconv.Atoi: parsing "2025-02-13T09:46:37-05:00": invalid syntax`.

* address PR comments

---------

Co-authored-by: Caio Almeida <117518+caiosba@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants