Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataChallenge] Make historic weather data available #6

Open
1 of 7 tasks
mmaelicke opened this issue Dec 29, 2020 · 0 comments
Open
1 of 7 tasks

[DataChallenge] Make historic weather data available #6

mmaelicke opened this issue Dec 29, 2020 · 0 comments
Labels
Data Challenge This issue is Data Challenge eligible

Comments

@mmaelicke
Copy link
Member

mmaelicke commented Dec 29, 2020

This is part of a DataChallenge

During the lecture, you downloaded reference weather data from various sources. So did all the other students. On the one hand, it is great to learn the whole procedure of downloading, reading, transforming, understanding and finally sometimes using third party data. On the other hand, you don't have the reference data used in the last years by the other students.
Well, we could simply add the data each year. But we could also let the repository do this for us. In both cases, remind that this repository is public and before you upload third party data, you' ll have to check the licenses and conditions at which you are allowed to re-distribute the data (cause this is what we are technically doing).

The idea of this issue is to collect and add reference data for all years that are present in the /hobo/<year> subfolder. Then, we need a script that is capable of downloading reference each year. Keep also in mind, that we want to have a full record (not only always December and January).

Depending on the data provider, we can add the reference data on a weekly, monthly, or annual basis. You will find yourself in this situation quite often. One approach is to automate the script with a cloud function or a cloud virtual machine. It is also possible to use a Github action (which is technically a VM).
So we want a complete data set. Every year. The less I have to remind next year to start a script etc, the better.

It is possible to solve this challenge with R, but that's probably not a good idea. Python or Go would be my choice here to build a small data harvesting workflow.

Possible steps include:

  • create a new branch or fork the whole repo
  • add a new /scripts folder
  • design a library (library, not script. We want to reuse the functions all over the place) that can download specific data, like specific data providers, date ranges or i.e. simply the last seven days
  • write one or a number of scripts to harvest data
  • discuss strategies to automate the scripts (local, cloud VM, Github action) - either in a group or with @mmaelicke
  • implement one strategy if applicable #9
  • make a pull request from your branch or fork to the master branch and assign @mmaelicke as reviewer

The code review will be used as assignment.

@mmaelicke mmaelicke added the Data Challenge This issue is Data Challenge eligible label Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Challenge This issue is Data Challenge eligible
Projects
None yet
Development

No branches or pull requests

1 participant