You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the lecture, you downloaded reference weather data from various sources. So did all the other students. On the one hand, it is great to learn the whole procedure of downloading, reading, transforming, understanding and finally sometimes using third party data. On the other hand, you don't have the reference data used in the last years by the other students.
Well, we could simply add the data each year. But we could also let the repository do this for us. In both cases, remind that this repository is public and before you upload third party data, you' ll have to check the licenses and conditions at which you are allowed to re-distribute the data (cause this is what we are technically doing).
The idea of this issue is to collect and add reference data for all years that are present in the /hobo/<year> subfolder. Then, we need a script that is capable of downloading reference each year. Keep also in mind, that we want to have a full record (not only always December and January).
Depending on the data provider, we can add the reference data on a weekly, monthly, or annual basis. You will find yourself in this situation quite often. One approach is to automate the script with a cloud function or a cloud virtual machine. It is also possible to use a Github action (which is technically a VM).
So we want a complete data set. Every year. The less I have to remind next year to start a script etc, the better.
It is possible to solve this challenge with R, but that's probably not a good idea. Python or Go would be my choice here to build a small data harvesting workflow.
Possible steps include:
create a new branch or fork the whole repo
add a new /scripts folder
design a library (library, not script. We want to reuse the functions all over the place) that can download specific data, like specific data providers, date ranges or i.e. simply the last seven days
write one or a number of scripts to harvest data
discuss strategies to automate the scripts (local, cloud VM, Github action) - either in a group or with @mmaelicke
This is part of a DataChallenge
During the lecture, you downloaded reference weather data from various sources. So did all the other students. On the one hand, it is great to learn the whole procedure of downloading, reading, transforming, understanding and finally sometimes using third party data. On the other hand, you don't have the reference data used in the last years by the other students.
Well, we could simply add the data each year. But we could also let the repository do this for us. In both cases, remind that this repository is public and before you upload third party data, you' ll have to check the licenses and conditions at which you are allowed to re-distribute the data (cause this is what we are technically doing).
The idea of this issue is to collect and add reference data for all years that are present in the
/hobo/<year>
subfolder. Then, we need a script that is capable of downloading reference each year. Keep also in mind, that we want to have a full record (not only always December and January).Depending on the data provider, we can add the reference data on a weekly, monthly, or annual basis. You will find yourself in this situation quite often. One approach is to automate the script with a cloud function or a cloud virtual machine. It is also possible to use a Github action (which is technically a VM).
So we want a complete data set. Every year. The less I have to remind next year to start a script etc, the better.
It is possible to solve this challenge with R, but that's probably not a good idea. Python or Go would be my choice here to build a small data harvesting workflow.
Possible steps include:
/scripts
folderThe code review will be used as assignment.
The text was updated successfully, but these errors were encountered: