-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide tooling to aggregate files in snapshots directory #62
Comments
@jgehrcke thanks for creating this project and keeping it going. I was hoping to get an aggregated view for paths and referrers, like the one there is for views and clones. You can leave the individual files in the snapshots directory and aggregate the data and store it separately. Is that something you have on your radar? Is there anything I can do to help? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Over time, the number of individual files in the
../ghrs-data/snapshots/
directory grows to beO(1000)
per year. This is not a problem forgit
. However, it creates inconveniences. For example, the snapshots directory cannot be browsed meaningfully anymore via github:Note that only the oldest files are shown here, the newer files are truncated.
Another inconvenience is that upon checkout and parsing it might actually make a noticeable timing difference between having to write / read one file, or having to write (upon checkout) and read (upon parsing) 1000 files.
I think in the long run the Action should automatically aggregate data into less individual files (with each file having more content, obviously), so that maybe there are overall O(10) files per year.
One question is if the files should be nicely readable CSV files or if it makes sense to use a different serialization format.
An intermediate pragmatic step for me is to build tooling that allows to do this aggregation out-of-band, i.e. not as part of an Action run. The changes can then be manually committed to the data branch.
The text was updated successfully, but these errors were encountered: