Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow to gen data #329

Open
wants to merge 263 commits into
base: feat/pull-v2-api
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 244 commits
Commits
Show all changes
263 commits
Select commit Hold shift + click to select a range
5622d8d
try using bash to run script
ChenglimEar Oct 31, 2023
d6a9f31
try to get post-create-command to run in workflow
ChenglimEar Oct 31, 2023
c92051e
try adding dev container to registry
ChenglimEar Oct 31, 2023
1a50f4e
use dev container in registry for workflow
ChenglimEar Oct 31, 2023
3c2da8f
if dev container modified, then we don't do other steps
ChenglimEar Oct 31, 2023
3dcccda
list dir before build
ChenglimEar Oct 31, 2023
16dc26c
for docker build to not use cache
ChenglimEar Oct 31, 2023
a6ce739
don't change context when building Dockerfile in workflow
ChenglimEar Oct 31, 2023
a43b445
remove need for action script
ChenglimEar Nov 1, 2023
2c00b68
look for changes within same branch when deciding to build dev container
ChenglimEar Nov 1, 2023
85c3717
look for changes against last sha
ChenglimEar Nov 1, 2023
5ad4e61
test getting dev container to build
ChenglimEar Nov 1, 2023
1275bcf
try again to pick up recent changes
ChenglimEar Nov 1, 2023
a3a736e
try another action for listing changed files
ChenglimEar Nov 1, 2023
11fdf8e
cp files from latest commit to ckingbailey v2 repo
ckingbailey Nov 1, 2023
d9978eb
check detection of changed files
ChenglimEar Nov 1, 2023
5fd7b06
try new mechanism to check for change to devcontainer
ChenglimEar Nov 1, 2023
d09f171
detect change to devcontainer
ChenglimEar Nov 1, 2023
950273b
change to new method for setting step output in workflow
ChenglimEar Nov 1, 2023
cc9a529
update main workflow to use working mechanism for detecting changed
ChenglimEar Nov 1, 2023
0b5894c
only build dev container in main workflow
ChenglimEar Nov 1, 2023
02bbf2a
add separate job that runs when push event triggered
ChenglimEar Nov 1, 2023
b6e1745
only build devcontainer on push
ChenglimEar Nov 1, 2023
bd80bed
forgot to checkout on push
ChenglimEar Nov 1, 2023
09b0045
log in to container registry when generating website
ChenglimEar Nov 1, 2023
6525032
try running job in container
ChenglimEar Nov 1, 2023
4453591
add credentials for container registry
ChenglimEar Nov 1, 2023
c0626b9
force container rebuild
ChenglimEar Nov 1, 2023
a1ea53c
run main workflow in container
ChenglimEar Nov 1, 2023
664b3ad
remove container action
ChenglimEar Nov 1, 2023
5c6f418
try running existing make code in container
ChenglimEar Nov 1, 2023
1c5ea6a
add postgres to generation job
ChenglimEar Nov 1, 2023
5a13c67
set up access to postgres
ChenglimEar Nov 1, 2023
f6d80ca
config postgres service
ChenglimEar Nov 1, 2023
cde75ab
remove misplaced 'if' in workflow
ChenglimEar Nov 1, 2023
fc2fe1e
set password for connecting to postgres and list db at start to confirm
ChenglimEar Nov 1, 2023
ac64ba0
separate out step to check postgres so that we can easily check the
ChenglimEar Nov 1, 2023
f08c7c1
remove unused script
ChenglimEar Nov 1, 2023
8b55cd9
remove new version of sqlalchemy since it breaks csvsql in `make import`
ChenglimEar Nov 1, 2023
d239c1f
add some conditions for job execution in main workflow
ChenglimEar Nov 1, 2023
9a64554
add check for version of sqlalchemy
ChenglimEar Nov 1, 2023
1ee8282
get container to rebuild when requirements change
ChenglimEar Nov 1, 2023
6ca18db
adjust logic for when jobs run in main workflow
ChenglimEar Nov 1, 2023
aacdec8
try summarizing tables created in main workflow
ChenglimEar Nov 1, 2023
3e11a12
make sure we clean up downloads before we start
ChenglimEar Nov 1, 2023
b883c21
Merge branch 'master' into add-workflow-to-gen-data
ChenglimEar Nov 1, 2023
565aede
try testing csvsql early
ChenglimEar Nov 1, 2023
a80b6e7
some logging to test csvsql
ChenglimEar Nov 1, 2023
c9988aa
turn on verbose output while testing csvsql
ChenglimEar Nov 1, 2023
0dd3c03
try sending csv file through stdin for csvsql
ChenglimEar Nov 1, 2023
9678551
upgrade csvkit to 1.3.0 and upgraded its dependencies where needed
ChenglimEar Nov 2, 2023
a1812ff
remove files no longer needed in download dir
ChenglimEar Nov 2, 2023
062d4b1
set postgres version in dev container and workflow to 9.6 to match
ChenglimEar Nov 2, 2023
82408d9
Merge branches 'feat/pull-v2-api' and 'feat/pull-v2-api' of github.co…
ChenglimEar Nov 2, 2023
cf785f2
Merge branch 'feat/pull-v2-api' into add-workflow-to-gen-data
ChenglimEar Nov 2, 2023
db8c504
update workflow names
ChenglimEar Nov 2, 2023
c176028
make use of sql files to create tables
ChenglimEar Nov 2, 2023
48f6259
Update import-file to display schema of created table
ChenglimEar Nov 2, 2023
ad869ef
Update import-file to log more info about postgresql tables
ChenglimEar Nov 2, 2023
dde12b4
Update import-file to point psql to DATABASE_NAME
ChenglimEar Nov 2, 2023
c33d743
Update import-file to use the right quote around table name in psql
ChenglimEar Nov 2, 2023
f9f2654
Update import-file to remove debug logging
ChenglimEar Nov 2, 2023
23ff55b
Update Makefile to use saved sql for creating tables from spreadsheet…
ChenglimEar Nov 2, 2023
84c3a26
fix Makefile by moving bash into file and saved generated sql for tab…
ChenglimEar Nov 3, 2023
a51ee6c
some fixes to get csvkit 1.3.0 working - not fully working yet...
ChenglimEar Nov 3, 2023
bb4354b
Merge branch 'upgrade-csvkit' of github.com:caciviclab/disclosure-bac…
ChenglimEar Nov 3, 2023
dea94a1
make sure data upload for spreadsheet data does not use inference (ie
ChenglimEar Nov 3, 2023
ac93903
debug version of csvkit installed
ChenglimEar Nov 3, 2023
fb456c7
verify python version at time of install on travis
ChenglimEar Nov 3, 2023
929372a
remove sudo for pip install
ChenglimEar Nov 3, 2023
b6dfe8e
Merge branch 'upgrade-csvkit' into add-workflow-to-gen-data
ChenglimEar Nov 3, 2023
ff5adfe
remove download/main.py dependency on latest version of sqlalchemy
ChenglimEar Nov 3, 2023
a2e99a0
use later postgres
ChenglimEar Nov 3, 2023
8394fa3
update postgres for dev container also
ChenglimEar Nov 3, 2023
809710d
download new netfile csvs before import
ChenglimEar Nov 3, 2023
f7cf802
gracefully handle records missing transaction data
ChenglimEar Nov 3, 2023
2e0387d
add netfile v2 data to database during import
ChenglimEar Nov 3, 2023
b6b668e
make sure dir exists for saving v2 csv files
ChenglimEar Nov 3, 2023
f496e73
forgot to import os
ChenglimEar Nov 3, 2023
2eb24c6
fix param name
ChenglimEar Nov 3, 2023
c2a0f07
make netfile v2 download a part of `make download`
ChenglimEar Nov 3, 2023
715d296
add requirements for netfile v2 code
ChenglimEar Nov 4, 2023
81d7d1b
update python-dateutil
ChenglimEar Nov 4, 2023
9ff8b0e
try to cause failure when pip install fails
ChenglimEar Nov 4, 2023
6c9dcbf
upgrade babel
ChenglimEar Nov 4, 2023
80e07e3
update pytz
ChenglimEar Nov 4, 2023
8f42b3d
Merge branch 'check-using-digest' into upgrade-csvkit
ChenglimEar Nov 9, 2023
a1bfe9a
Merge branch 'upgrade-csvkit' into add-workflow-to-gen-data
ChenglimEar Nov 9, 2023
e297bd6
Merge branch 'check-using-digest' into upgrade-csvkit
ChenglimEar Nov 12, 2023
fb60d32
Merge branch 'check-using-digest' into upgrade-csvkit
ChenglimEar Nov 12, 2023
b2fb864
allow csvkit to pull in the correct agate dependencies and add script to
ChenglimEar Nov 13, 2023
3c68082
remove whitespace for some key columns
ChenglimEar Nov 13, 2023
05cee7c
Merge branch 'check-using-digest' into upgrade-csvkit
ChenglimEar Nov 13, 2023
fb33f7b
Merge branch 'upgrade-csvkit' into add-workflow-to-gen-data
ChenglimEar Nov 13, 2023
8076382
split contributions by type to multiple elections when a candidate was
ChenglimEar Nov 18, 2023
ea0c9ab
removed commented code
ChenglimEar Nov 18, 2023
b6937a3
Merge branch 'fix-contributions-by-type' into upgrade-csvkit
ChenglimEar Nov 18, 2023
b3e95b7
create candidate_summary view to associate "Summary" info with specific
ChenglimEar Nov 19, 2023
7cc4317
add total contributions to digest.json
ChenglimEar Nov 20, 2023
c079642
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Nov 20, 2023
bb6c874
use hash of hash for contributions by type
ChenglimEar Nov 25, 2023
a460ff8
add total contributions by type and source to digests
ChenglimEar Nov 25, 2023
7c707ec
take election into account when calculating total contributions and
ChenglimEar Nov 25, 2023
3c9251e
organize totals calculated from various sources in digests.json
ChenglimEar Nov 25, 2023
248e6bd
update digests.json to include more totals
ChenglimEar Nov 25, 2023
ea1a077
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Nov 25, 2023
dc829b0
calculate contribution totals for all tickets (candidates and referen…
ChenglimEar Nov 25, 2023
ee2e56d
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Nov 25, 2023
1cf25fa
add more totals to digest and separate by contributions vs expenditures
ChenglimEar Nov 28, 2023
c8c2d9a
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Nov 28, 2023
020eb5b
update expenditures to be split on election and other calculations to
ChenglimEar Nov 28, 2023
e32d6e1
Merge branch 'master' into add-totals-to-digests
ChenglimEar Nov 28, 2023
e4a9a67
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Nov 28, 2023
82ededc
revert committee contribution list calculator
ChenglimEar Nov 30, 2023
14f67b4
some comments about the totals calculated for digests.json
ChenglimEar Nov 30, 2023
4ecd440
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Nov 30, 2023
dca1bae
Merge branch 'master' into add-totals-to-digests
ChenglimEar Nov 30, 2023
c0bbc32
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Nov 30, 2023
70c9461
Merge branch 'fix-contributions-by-type' into upgrade-csvkit
ChenglimEar Dec 1, 2023
ea0229e
update digests to only show totals that we want to compare
ChenglimEar Dec 1, 2023
56834f0
add loans to total for contributions by type and origin
ChenglimEar Dec 6, 2023
599c72e
move totals logic out of main
ChenglimEar Dec 9, 2023
8d5684b
Merge branch 'master' into add-totals-to-digests
ChenglimEar Dec 9, 2023
55192e0
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Dec 9, 2023
6faac9d
remove build directory for reset
ChenglimEar Dec 9, 2023
4573670
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Dec 9, 2023
ee00671
add build directory back
ChenglimEar Dec 9, 2023
1db21ed
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Dec 9, 2023
d13c972
Merge branch 'master' into add-totals-to-digests
ChenglimEar Dec 10, 2023
7b14a9a
re-run
ChenglimEar Dec 10, 2023
17a2f7c
Merge branch 'master' into fix-contributions-by-type
ChenglimEar Dec 10, 2023
788e22d
Merge branch 'fix-contributions-by-type' into upgrade-csvkit
ChenglimEar Dec 10, 2023
db58792
Merge branch 'add-totals-to-digests' into fix-contributions-by-type
ChenglimEar Dec 10, 2023
c1990cd
Merge branch 'master' into fix-contributions-by-type
ChenglimEar Dec 17, 2023
692b6ea
switch total expenditures calculator to use new candidate_summary view
ChenglimEar Dec 17, 2023
4bb1698
Merge branch 'fix-contributions-by-type' into upgrade-csvkit
ChenglimEar Dec 17, 2023
7d63c37
remove build directory
ChenglimEar Dec 17, 2023
1cfbab6
remove build directory
ChenglimEar Dec 17, 2023
075308e
Merge branch 'fix-contributions-by-type' into upgrade-csvkit
ChenglimEar Dec 17, 2023
463a32d
add generated build directory
ChenglimEar Dec 17, 2023
a3d11cb
Merge branch 'fix-contributions-by-type' into upgrade-csvkit
ChenglimEar Dec 17, 2023
40290cb
Merge branch 'master' into fix-contributions-by-type
ChenglimEar Dec 20, 2023
9664e2a
add report on candidate totals
ChenglimEar Dec 20, 2023
a0400b6
attempt to get python 3.9 to be used
ChenglimEar Dec 22, 2023
7113922
Merge branch 'master' into fix-contributions-by-type
ChenglimEar Dec 22, 2023
db38143
don't use sudo for pip install
ChenglimEar Dec 22, 2023
b763994
Merge branch 'fix-contributions-by-type' into upgrade-csvkit
ChenglimEar Dec 22, 2023
927d3de
remove build dir to reset
ChenglimEar Feb 3, 2024
a6fccb2
Merge branch 'master' into upgrade-csvkit
ChenglimEar Feb 3, 2024
43dd66c
Merge branch 'master' into upgrade-csvkit
ChenglimEar Feb 3, 2024
39423db
update build dir to match master as reset
ChenglimEar Feb 3, 2024
fbf2505
remove unused var in calculator
ChenglimEar Feb 3, 2024
731686c
match up calculator with master branch
ChenglimEar Feb 3, 2024
832b3db
update build dir to match master
ChenglimEar Feb 18, 2024
6eb44d1
Merge branch 'master' into upgrade-csvkit
ChenglimEar Feb 18, 2024
5806f1f
upgrade csvkit
ChenglimEar Feb 18, 2024
465b881
update build dir
ChenglimEar Feb 18, 2024
44d37b0
Merge branch 'upgrade-csvkit' into add-workflow-to-gen-data
ChenglimEar Feb 18, 2024
a5aec37
match schema to latest infered by old csvkit
ChenglimEar Feb 18, 2024
64aee65
make sure we are pushing to the same branch when deploying build
ChenglimEar Feb 18, 2024
f0c06e8
Merge branch 'master' into track-schema-changes
ChenglimEar Feb 18, 2024
50bd68e
specify the branch to push to for travis auto-deploy
ChenglimEar Feb 18, 2024
50ca568
Run `make clean download import process`
Feb 18, 2024
c9a3b76
Merge branch 'master' into track-schema-changes
ChenglimEar Feb 18, 2024
3abb21c
Run `make clean download import process`
Feb 18, 2024
ff45680
update build dir to match current
ChenglimEar Feb 18, 2024
c8b6f94
add schema.sql file
ChenglimEar Feb 18, 2024
8b18b65
Merge branch 'track-schema-changes' into upgrade-csvkit
ChenglimEar Feb 18, 2024
04c6375
don't deploy build on pull request build
ChenglimEar Feb 18, 2024
82cb37a
Merge branch 'track-schema-changes' into upgrade-csvkit
ChenglimEar Feb 18, 2024
8fb15e8
increase size of filer name for committees
ChenglimEar Feb 18, 2024
c9f4423
Run `make clean download import process`
Feb 18, 2024
8bba528
Merge branch 'master' into upgrade-csvkit
ChenglimEar Feb 20, 2024
46a5293
Run `make clean download import process`
Feb 20, 2024
6296668
clean up whitespace for some more candidate columns
ChenglimEar Feb 20, 2024
c9218b9
Run `make clean download import process`
Feb 20, 2024
6c06142
remove whitespace from referendums summary
ChenglimEar Feb 20, 2024
2e4c960
Merge branch 'upgrade-csvkit' of github.com:caciviclab/disclosure-bac…
ChenglimEar Feb 20, 2024
9e190ae
Run `make clean download import process`
Feb 20, 2024
678d721
remove commented out line
ChenglimEar Feb 27, 2024
93d718b
Merge branch 'upgrade-csvkit' of github.com:caciviclab/disclosure-bac…
ChenglimEar Feb 27, 2024
27881bf
remove build dir to reset
ChenglimEar Mar 2, 2024
17c6184
Merge branch 'master' into upgrade-csvkit
ChenglimEar Mar 2, 2024
750deaf
save new build dir
ChenglimEar Mar 2, 2024
e954c21
combine removal of leading and trailing white spaces into a single
ChenglimEar Mar 2, 2024
2c28ee3
Run `make clean download import process`
Mar 2, 2024
50bbec0
update build with recent fixes from main branch
ChenglimEar Mar 5, 2024
7efe43d
Merge branch 'upgrade-csvkit' of github.com:caciviclab/disclosure-bac…
ChenglimEar Mar 5, 2024
40ff1e8
re-use code to create table in bin/import-file
ChenglimEar Mar 5, 2024
b5954a2
remove build dir to reset
ChenglimEar Mar 5, 2024
93ec624
Run `make clean download import process`
Mar 5, 2024
9ec0ebd
Merge branch 'master' into upgrade-csvkit
ChenglimEar Mar 5, 2024
794d934
Merge branch 'upgrade-csvkit' of github.com:caciviclab/disclosure-bac…
ChenglimEar Mar 5, 2024
43db2e8
Run `make clean download import process`
Mar 5, 2024
e0ecb2b
clean up request to dump database schema
ChenglimEar Mar 9, 2024
d461b13
remove build dir to reset
ChenglimEar Mar 9, 2024
206f986
Merge branch 'master' into upgrade-csvkit
ChenglimEar Mar 9, 2024
08b6fc2
Run `make clean download import process`
Mar 9, 2024
12efe1d
remove build directory for refresh
ChenglimEar Mar 22, 2024
0a3da68
Merge branch 'upgrade-csvkit' of github.com:caciviclab/disclosure-bac…
ChenglimEar Mar 22, 2024
969df8a
Merge branch 'master' into upgrade-csvkit
ChenglimEar Mar 22, 2024
b86effd
Run `make clean download import process`
Mar 22, 2024
a3cc106
Merge branch 'master' into upgrade-csvkit
ChenglimEar Apr 8, 2024
06f0819
Run `make clean download import process`
Apr 8, 2024
e37d1f2
pick committee distinct on filer ID according to order of value in el…
ChenglimEar Apr 13, 2024
881733e
Merge branch '352-select-the-most-recent-committee-name' into upgrade…
ChenglimEar Apr 13, 2024
a558379
Run `make clean download import process`
Apr 13, 2024
0b160c9
Run `make clean download import process`
Apr 13, 2024
a043a76
remove check for Ballot_Measure_Election when looking for committee name
ChenglimEar Apr 13, 2024
a828a72
Merge branch '352-select-the-most-recent-committee-name' into upgrade…
ChenglimEar Apr 13, 2024
f0c00fa
Run `make clean download import process`
Apr 13, 2024
5006c28
force rebuild
ChenglimEar Apr 14, 2024
1f54189
Run `make clean download import process`
Apr 14, 2024
ffdc7e6
Merge branch 'upgrade-csvkit' into add-workflow-to-gen-data
ChenglimEar Apr 14, 2024
1e99f65
Merge branch 'feat/pull-v2-api' into add-workflow-to-gen-data
ChenglimEar May 26, 2024
e7a67f3
Run `make clean download import process`
May 29, 2024
8059e9f
change image used for workflow to generate website data to match vers…
ChenglimEar May 30, 2024
c1e48a5
Run `make clean download import process`
May 30, 2024
6a50288
set dev container and github actions to use the same postgres version
ChenglimEar May 30, 2024
8d65708
try action checkout v4
ChenglimEar May 30, 2024
59b34c2
print out some dir info to figure out why git thinks it is not a repo
ChenglimEar May 30, 2024
8adaf50
cause early git failure so we can try to fix it
ChenglimEar May 30, 2024
f961892
remove tab from github workflow file
ChenglimEar May 30, 2024
a79dce5
see if we can fix the git issue
ChenglimEar May 30, 2024
50055fb
remove test commands
ChenglimEar May 30, 2024
567c967
Run `make clean download import process`
May 30, 2024
87657f7
show version of key components when cleaning
ChenglimEar May 30, 2024
fecf2ec
Merge branch 'add-workflow-to-gen-data' of github.com:caciviclab/disc…
ChenglimEar May 30, 2024
ecc51fa
Run `make clean download import process`
May 30, 2024
4c3f104
add place to insert new downloads
ChenglimEar May 30, 2024
71695a9
Merge branch 'add-workflow-to-gen-data' of github.com:caciviclab/disc…
ChenglimEar May 30, 2024
35fa76d
get image to be created with new branch and don't use the image during
ChenglimEar Jul 2, 2024
af8f51d
add explicit check for docker image in order to run jobs that require it
ChenglimEar Jul 2, 2024
bfd4b9a
log in to docker early
ChenglimEar Jul 2, 2024
3d8c799
build container if it's not there
ChenglimEar Jul 2, 2024
ef9e09a
try increasing size of filer name col
ChenglimEar Jul 2, 2024
6d4cad4
put shared postgres settings in global env vars
ChenglimEar Jul 3, 2024
e110b7e
clean up dev container
ChenglimEar Jul 3, 2024
0e6cca5
add post-create-command.sh back
ChenglimEar Jul 3, 2024
a6263c3
remove pwd in Dockerfile
ChenglimEar Jul 3, 2024
c83b157
Merge branch 'feat/pull-v2-api' into add-workflow-to-gen-data
ChenglimEar Jul 3, 2024
fb4e811
write csv from polars dataframe
ChenglimEar Jul 3, 2024
f929a6e
merge requirements for netfile v2 into main requirements file
ChenglimEar Jul 3, 2024
d7163d6
allow committee id to be null in H-Loan data
ChenglimEar Jul 3, 2024
3262819
Run `make clean download import process`
Jul 3, 2024
b067517
remove copy of download/requirements.txt from Dockerfile
ChenglimEar Jul 3, 2024
b756009
Run `make clean download import process`
Jul 3, 2024
8833de7
move new data to be imported to a different target in Makefile
ChenglimEar Jul 3, 2024
a1ff8f6
Merge branch 'add-workflow-to-gen-data' of github.com:caciviclab/disc…
ChenglimEar Jul 3, 2024
7986727
Run `make clean download import process`
Jul 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,11 @@ ENV PYTHONUNBUFFERED 1
RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
&& apt-get -y install --no-install-recommends postgresql-client ruby-full

RUN pwd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this? I can't see why the docker build would need to pwd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

COPY ./.devcontainer/post-create-command.sh /scripts/
COPY ./requirements.txt requirements.txt
COPY ./download/requirements.txt download/requirements.txt
COPY ./Gemfile Gemfile
COPY ./Gemfile.lock Gemfile.lock
RUN bash /scripts/post-create-command.sh

1 change: 0 additions & 1 deletion .devcontainer/README
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,3 @@

The dev container can be used to work on the project in a consistent environment independent of what machine you are working on. When working in the dev container, you will have a postgres instance running. You can access the postgres instance just by running psql.


2 changes: 1 addition & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"service": "app",
"workspaceFolder": "/workspace",
"remoteUser": "vscode",
"postCreateCommand": "bash ./.devcontainer/post-create-command.sh",
//"postCreateCommand": "bash ./.devcontainer/post-create-command.sh",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"postStartCommand": "git config --global --add safe.directory ${containerWorkspaceFolder}",
"forwardPorts": [4567, 5432],
"extensions": [
Expand Down
4 changes: 3 additions & 1 deletion .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,10 @@ services:
network_mode: service:db

db:
#image: postgres:16.0-bullseye
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#image: postgres:latest
image: postgres:15.4
#image: postgres:15.4
image: postgres:15.6-bullseye
restart: unless-stopped
volumes:
- postgres-data:/var/lib/postgresql/data
Expand Down
5 changes: 0 additions & 5 deletions .devcontainer/enable-ssh-agent.ps1

This file was deleted.

4 changes: 3 additions & 1 deletion .devcontainer/post-create-command.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
#!/bin/bash

set -e

pip install --upgrade pip
#pip install 'urllib3[secure]'
pip install -r requirements.txt
pip install -r gdrive_requirements.txt
pip install -r download/requirements.txt
sudo gem install pg bundler
sudo bundle install
110 changes: 94 additions & 16 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,103 @@
# This workflow will later be replaced with logic to "Generate Website Data"
# The verify-gdrive.yml workflow file will be renamed to this one
# We have to introduce this change in steps because GitHub gets confused until
# we add the new workflow file to the master branch
name: "Generate Website Data"
on:
workflow_dispatch:
push:
jobs:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want this to run on every single push? Do we have unlimited GH Actions hours?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not heard of limited hours. There is a concept of throttling. So if we are a highly active bunch, that may lead to problems. Otherwise, we're ok. I also have some checks to limit what is run. If we want to, we can limit jobs/steps based on the files that were changed. For now, I'm keeping it mostly simple, except for the piece that builds the docker image (which only occurs when required).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this, "GitHub Actions usage is free for standard GitHub-hosted runners in public repositories". It's amazing how much compute & storage GH gives away for free 😄

generate:
build:
runs-on: ubuntu-latest
env:
REPO_OWNER: ${{ github.repository_owner}}
REPO_BRANCH: ${{ github.ref_name }}
SERVICE_ACCOUNT_KEY_JSON: ${{ secrets.SERVICE_ACCOUNT_KEY_JSON }}
GDRIVE_FOLDER: ${{ vars.GDRIVE_FOLDER }}
outputs:
devcontainer: ${{ steps.filter.outputs.devcontainer }}
noncontainer: ${{ steps.filter.outputs.noncontainer }}
steps:
- uses: actions/checkout@v3
- run: pip install -r gdrive_requirements.txt
- run: python test_pull_from_gdrive.py
- name: Archive pulled files
uses: actions/upload-artifact@v3
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v40
- name: List all changed files
id: filter
run: |
devcontainer=false
noncontainer=true
for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
echo "$file was changed"
if [[ ${{github.event_name}} = push ]]; then
if [[ $file = .devcontainer* ]]; then
devcontainer=true
elif [[ $file = *requirements.txt* ]]; then
devcontainer=true
elif [[ $file = Gemfile* ]]; then
devcontainer=true
fi
fi
done

echo "devcontainer=$devcontainer" >> $GITHUB_OUTPUT
echo "noncontainer=$noncontainer" >> $GITHUB_OUTPUT
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
name: redacted-netfile-files
path: .local/downloads
registry: ghcr.io
username: ${{github.actor}}
password: ${{secrets.GITHUB_TOKEN}}
- name: Build dev container
if: steps.filter.outputs.devcontainer == 'true'
run: |
docker build --no-cache --tag ghcr.io/caciviclab/disclosure-backend-static/${{github.ref_name}}:latest -f ./.devcontainer/Dockerfile .
docker push ghcr.io/caciviclab/disclosure-backend-static/${{github.ref_name}}:latest
- name: Check code changes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are tags immutable on GHCR? If not, why not use github.ref_name as the tag instead of a separate "sub-repo" for every commit?
Does GitHub let us fill their Docker registry with as many repos and tags as we care to? There's no limit? I'm surprised they'd be willing to host an image for every single commit, free of charge. Do the Docker images get pruned after some time or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your first question, I think you're asking why we even use 'latest' instead of just using the branch name. The reason is mainly convention. Typically, there's a version number or string, but I don't want to generate a new version on every commit. I just want to replace the latest.

"GitHub Packages usage is free for public packages." I don't think there's a limit and they actually don't let you delete it if it's popular. Also, we are creating an image per branch and not per commit. It gets overwritten per branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a small thing, but to clarify my first question, in the form
ghcr.io/caciviclab/disclosure-backend-static/{ref_name}:latest
ghcr.io is the registry, latest is the tag, and all of caciviclab/disclosure-backend-static/{ref_name} is the container repository. Appending the ref_name to the repo name creates a new repo for every ref_name, rather than tagging each image with the ref_name. it's a bit of an odd convention, but I don't think it really matters for us because I don't think we have a need to browse container images à la Dockerhub. In fact, I can't even find how to browse public images on GH Packages.

if: steps.filter.outputs.noncontainer == 'true'
run: |
echo "TODO: run test to verify that code changes are good"
generate:
needs: build
if: needs.build.outputs.noncontainer == 'true'
runs-on: ubuntu-latest
container:
image: ghcr.io/caciviclab/disclosure-backend-static/${{github.ref_name}}:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.github_token }}
env:
REPO_OWNER: ${{ github.repository_owner}}
REPO_BRANCH: ${{ github.ref_name }}
SERVICE_ACCOUNT_KEY_JSON: ${{ secrets.SERVICE_ACCOUNT_KEY_JSON }}
GDRIVE_FOLDER: ${{ vars.GDRIVE_FOLDER }}
PGHOST: postgres
PGDATABASE: disclosure-backend
PGUSER: app_user
PGPASSWORD: app_password
services:
postgres:
#image: postgres:9.6-bullseye
image: postgres:15.6-bullseye
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out line

Copy link
Contributor Author

@ChenglimEar ChenglimEar Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to remind myself that the postgres versions don't match up with travis-ci and so we have migrate travis-ci. I think I ran into some problem trying to set up 9.6.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Maybe a note about it n the comment then?

env:
POSTGRES_USER: app_user
POSTGRES_DB: disclosure-backend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have these in top-level env vars in the workflow so they can be shared between the two places they get used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

POSTGRES_PASSWORD: app_password
steps:
- uses: actions/checkout@v4
- name: Check setup
run: |
git -v
# This keeps git from thinking that the current dir is not a repo even though a .git dir exists
git config --global --add safe.directory "$GITHUB_WORKSPACE"
psql -l
echo "c1,c2" > test.csv
echo "a,b" >> test.csv
cat test.csv
csvsql -v --db postgresql:///disclosure-backend --insert test.csv
echo "List tables"
psql -c "SELECT * FROM pg_catalog.pg_tables WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema';"

pip show sqlalchemy
- name: Create csv files
run: |
make clean
make download
make import
make process
- name: Summarize results
run: |
echo "List tables"
psql -c "SELECT * FROM pg_catalog.pg_tables WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema';"

10 changes: 8 additions & 2 deletions .github/workflows/verify-gdrive.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,21 @@ on:
jobs:
check:
runs-on: ubuntu-latest
container:
image: ghcr.io/caciviclab/disclosure-backend-static/${{github.ref_name}}:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if no image has been pushed for github.ref_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should fail. Good question, however. I've updated main.yml to build the image as soon as a new branch is created. Since the above is a manually run workflow, you would have to be quick to run it if you wanted to generate the failure.

credentials:
username: ${{ github.actor }}
password: ${{ secrets.github_token }}

env:
REPO_OWNER: ${{ github.repository_owner}}
REPO_BRANCH: ${{ github.ref_name }}
SERVICE_ACCOUNT_KEY_JSON: ${{ secrets.SERVICE_ACCOUNT_KEY_JSON }}
GDRIVE_FOLDER: ${{ vars.GDRIVE_FOLDER }}
steps:
- uses: actions/checkout@v3
- run: pip install -r gdrive_requirements.txt
- run: python test_pull_from_gdrive.py
- name: Test pull from gdrive
run: python test_pull_from_gdrive.py
- name: Archive pulled files
uses: actions/upload-artifact@v3
with:
Expand Down
26 changes: 21 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@ clean-spreadsheets:
rm -rf downloads/csv/*.csv downloads/csv/office_elections.csv downloads/csv/measure_committees.csv downloads/csv/elections.csv

clean:
rm -rf downloads/raw downloads/csv
rm -rf downloads/raw downloads/csv .local/downloads .local/csv
git --version
python --version
ruby --version
psql --version

process: process.rb
# todo: remove RUBYOPT variable when activerecord fixes deprecation warnings
Expand All @@ -21,6 +25,9 @@ process: process.rb
bin/report-candidates
git --no-pager diff build/digests.json

download-netfile-v2:
python download/main.py

download-spreadsheets: downloads/csv/candidates.csv downloads/csv/committees.csv \
downloads/csv/referendums.csv downloads/csv/name_to_number.csv \
downloads/csv/office_elections.csv downloads/csv/elections.csv
Expand All @@ -36,7 +43,8 @@ upload-cache:
tar czf - downloads/csv downloads/static downloads/cached-db \
| aws s3 cp - s3://odca-data-cache/$(shell date +%Y-%m-%d).tar.gz --acl public-read

download: download-spreadsheets \
download: download-netfile-v2 \
download-spreadsheets \
download-COAK-2014 download-COAK-2015 download-COAK-2016 \
download-COAK-2017 download-COAK-2018 \
download-COAK-2019 download-COAK-2020 \
Expand Down Expand Up @@ -110,9 +118,7 @@ do-import-spreadsheets:
csvsql --db postgresql:///$(DATABASE_NAME) --insert --no-create --no-inference downloads/csv/elections.csv
echo 'ALTER TABLE "elections" ADD COLUMN id SERIAL PRIMARY KEY;' | psql $(DATABASE_NAME)

import-data: 496 497 A-Contributions B1-Loans B2-Loans C-Contributions \
D-Expenditure E-Expenditure F-Expenses F461P5-Expenditure F465P3-Expenditure \
F496P3-Contributions G-Expenditure H-Loans I-Contributions Summary
import-data: import-old-data import-new-data
echo 'CREATE TABLE IF NOT EXISTS "calculations" (id SERIAL PRIMARY KEY, subject_id integer, subject_type varchar(30), name varchar(40), value jsonb);' | psql $(DATABASE_NAME)
./bin/remove_duplicate_transactions
./bin/make_view
Expand All @@ -124,9 +130,19 @@ recreatedb:
reindex:
ruby search_index.rb

import-new-data:
echo 'TODO: add new data to import'

import-old-data: 496 497 A-Contributions B1-Loans B2-Loans C-Contributions \
D-Expenditure E-Expenditure F-Expenses F461P5-Expenditure F465P3-Expenditure \
F496P3-Contributions G-Expenditure H-Loans I-Contributions Summary elections_v2 committees_v2 a_contributions_v2

496 497 A-Contributions B1-Loans B2-Loans C-Contributions D-Expenditure E-Expenditure F-Expenses F461P5-Expenditure F465P3-Expenditure F496P3-Contributions G-Expenditure H-Loans I-Contributions Summary:
DATABASE_NAME=$(DATABASE_NAME) ./bin/import-file $(CSV_PATH) $@

elections_v2 committees_v2 a_contributions_v2:
DATABASE_NAME=$(DATABASE_NAME) ./bin/import-file $(CSV_PATH) $@ 0

downloads/csv/candidates.csv:
mkdir -p downloads/csv downloads/raw
$(WGET) -O- \
Expand Down
2 changes: 2 additions & 0 deletions bin/clean
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,6 @@ cat <<-QUERY | psql ${database_name}
DELETE FROM "$table_name"
WHERE "Tran_Date" is NULL;
QUERY
else
echo
fi
16 changes: 10 additions & 6 deletions bin/import-file
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
# Contains logic to import files regardless of how many there are.
# If there's no file, don't do anything with the database.
# If the table already exists in the database, don't try to re-create it.
# The fix_pending parameter defaults to 1 if not set. Set it to 0
# to skip the section that fixes the pending Filer_ID
#
# Usage:
# bin/import-file [csv_path] [table]
# bin/import-file downloads/csv A-Contributions
# bin/import-file [csv_path] [table] [fix_pending]
# bin/import-file downloads/csv A-Contributions 1
set -euo pipefail

if [ -z "${DATABASE_NAME:-""}" ]; then
Expand All @@ -14,12 +16,13 @@ if [ -z "${DATABASE_NAME:-""}" ]; then
fi

if [ $# -eq 0 ]; then
echo "Usage: bin/import-file [csv_path] [table]"
echo "Usage: bin/import-file [csv_path] [table] [fix_pending]"
exit 1
fi

csv_path=$1
table_name=$2
fix_pending=${3:-1}
filename_glob=$csv_path'/*'${table_name}'.csv'
table_exists=
if psql disclosure-backend -c '\d "'${table_name}'"' >/dev/null 2>&1; then
Expand All @@ -34,9 +37,10 @@ if ls $filename_glob 2>/dev/null >/dev/null; then
csvsql --db postgresql:///$DATABASE_NAME --tables $table_name --insert --no-inference --no-create
echo -n ' Removing empty Tran_Date... '
./bin/clean "$DATABASE_NAME" "$table_name"
echo
echo -n ' Fixing pending Filer_IDs... '
./bin/fix-pending "$DATABASE_NAME" "$table_name"
if [ "$fix_pending" = "1" ]; then
echo -n ' Fixing pending Filer_IDs... '
./bin/fix-pending "$DATABASE_NAME" "$table_name"
fi
else
echo 'Found no files to import'
fi
12 changes: 6 additions & 6 deletions build/_data/committees/1421001.json
Original file line number Diff line number Diff line change
Expand Up @@ -1416,7 +1416,7 @@
"Tran_Date": "2020-07-15",
"Tran_NamF": "Jonathan",
"Tran_NamL": "Williams",
"Tran_Zip4": "94603",
"Tran_Zip4": "94602",
"election_name": "oakland-2020"
},
{
Expand All @@ -1429,7 +1429,7 @@
"Tran_Date": "2020-07-15",
"Tran_NamF": "Jonathan",
"Tran_NamL": "Williams",
"Tran_Zip4": "94602",
"Tran_Zip4": "94603",
"election_name": "oakland-2020"
},
{
Expand Down Expand Up @@ -1774,26 +1774,26 @@
"title": "Oakland November 3rd, 2020 General Election",
"Filer_ID": "1421001",
"Tran_Emp": "Jordan Custom Builders",
"Tran_Occ": "Operations Manager",
"Tran_Occ": "Owner",
"Entity_Cd": "Individual",
"Tran_Amt1": 200.0,
"Tran_Date": "2020-07-21",
"Tran_NamF": "Billy",
"Tran_NamL": "Jordan",
"Tran_Zip4": "75154",
"Tran_Zip4": "75204",
"election_name": "oakland-2020"
},
{
"title": "Oakland November 3rd, 2020 General Election",
"Filer_ID": "1421001",
"Tran_Emp": "Jordan Custom Builders",
"Tran_Occ": "Owner",
"Tran_Occ": "Operations Manager",
"Entity_Cd": "Individual",
"Tran_Amt1": 200.0,
"Tran_Date": "2020-07-21",
"Tran_NamF": "Billy",
"Tran_NamL": "Jordan",
"Tran_Zip4": "75204",
"Tran_Zip4": "75154",
"election_name": "oakland-2020"
},
{
Expand Down
2 changes: 1 addition & 1 deletion build/_data/elections/oakland/2018-06-05.json
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
"total_contributions": 15000.0
},
{
"name": "Service Employees International Union Local 1021 Issues PAC",
"name": "Oakland Athletics Baseball Company",
"type": "Measure",
"election_name": "oakland-june-2018",
"total_contributions": 10000.0
Expand Down
6 changes: 3 additions & 3 deletions build/_data/elections/oakland/2023-11-07.json
Original file line number Diff line number Diff line change
Expand Up @@ -73,19 +73,19 @@
],
"top_contributors_for_offices": [
{
"name": "UA Local 342",
"name": "Service Employees International Union Local 1021 Candidate PAC",
"type": "Office",
"election_name": "oakland-2023",
"total_contributions": 1200.0
},
{
"name": "Service Employees International Union Local 1021 Candidate PAC",
"name": "Oakland Education Association PAC",
"type": "Office",
"election_name": "oakland-2023",
"total_contributions": 1200.0
},
{
"name": "Peralta Federation of Teachers COPE",
"name": "Families in Action for Justice",
"type": "Office",
"election_name": "oakland-2023",
"total_contributions": 1200.0
Expand Down
Loading
Loading