Skip to content

Commit b76f359

Browse files
author
kracha
committed
Final changes to training
1 parent f1f6d5b commit b76f359

File tree

9 files changed

+30
-15
lines changed

9 files changed

+30
-15
lines changed

training/01_introduction.Rmd

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22

33

44

5-
These materials are meant to introduce you to the principles of open science, effective data management, and data archival with the DataONE data repository. It also provides an overview on the tools we will be using (remote servers, Rstudio, R, Troubleshooting, Exercises) throughout the training. This document is meant to take multiple days to complete depending on your previous knowledge on some of the topics.
5+
These materials are meant to introduce you to the principles of open science, effective data management, and data archival with the DataONE data repository. It also provides an overview on the tools we will be using (remote servers, Rstudio, R, Troubleshooting, Exercises) throughout the training. This document is meant to take multiple days to complete, depending on your previous knowledge.
6+
7+
We believe in allowing employees the space to fully grasp concepts during training, even if it means taking longer than expected. Quality learning is our priority, and there's no pressure to finish within a specific timeframe. You may find it helpful to take notes on important concepts, and you will always be able to refer back to this training during your time at NCEAS.
68

79
If you see anything that needs fixing, submit a issue in the
810
<a href = 'https://github.com/NCEAS/datateam-training/issues' target='_blank'> github issues </a>
@@ -51,11 +53,11 @@ On the servers, paths to files in your folder always start with `/home/yourusern
5153

5254
**Note** - if you are a more advanced user, you may use the method you prefer as long as it is evident where your file is from.
5355

54-
When you write scripts, try to avoid writing relative paths (which rely on what you have set your working directory to) as much as possible. Instead, write out the entire path as shown above, so that if another data team member needs to run your script, it is not dependent on a working directory.
56+
When you write scripts, try to avoid writing relative paths (which rely on what you have set your working directory `~/` to) as much as possible. Instead, write out the entire path as shown above, so that if another data team member needs to run your script, it is not dependent on a working directory.
5557

5658
## A note on R
5759

58-
This training assumes basic knowledge of R and RStudio. Spend at least 30 minutes walking through Jenny Bryan's excellent materials [here](http://stat545.com/block002_hello-r-workspace-wd-project.html) for a refresher.
60+
This training assumes basic knowledge of R and RStudio. Spend at least 15 minutes walking through Jenny Bryan's excellent materials [here](http://stat545.com/block002_hello-r-workspace-wd-project.html) for a refresher.
5961

6062
Throughout this training we will occasionally use the namespace syntax `package_name::function_name()` when writing a function. This syntax denotes which package a function came from. For example `dataone::getSystemMetadata` selects the `getSystemMetadata` function from the `dataone` R package. More detailed information on namespaces can be found [here](http://r-pkgs.had.co.nz/namespace.html).
6163

training/04_editing_eml.Rmd

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@ This chapter is a practical tutorial for using R to read, edit, write, and valid
55
Most of the functions you will see in this chapter will use the `arcticdatautils` and `EML` packages.
66

77
```{block, type = "note"}
8-
This chapter will be longest of all the sections! This is a reminder to take frequent breaks when completing this section.
8+
This chapter will be longest of all the sections! This is a reminder to take frequent breaks when completing this section.
9+
If you struggle with getting a piece of code to work more than 10 minutes, reach out to your supervisor for help.
910
```
1011
```{block, type = "note"}
1112
When using R to edit EML documents, run each line individually by highlighting the line and using CTRL+ENTER). Many EML functions only need to be ran once, and will either produce errors or make the EML invalid if run multiple times.

training/09_first_ticket.Rmd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# First Ticket
22

3-
After completing the previous chapters, Daphne or Jeanette will assign a ticket from RT. Login using your LDAP credentials got get familiarized with RT.
3+
After completing the previous chapters, your supervisor will assign a ticket from RT. Login using your LDAP credentials got get familiarized with RT.
44

55
```{r, child = '../workflows/pi_correspondence/navigate_rt.Rmd'}
66
```
@@ -16,6 +16,8 @@ We have developed some partially filled R scripts to get you started on working
1616

1717
You can use this template where you can [fill in the blanks](data/dataset_processing_example_blanks.R) to get familiar with the functions we use and workflow at first. We also have a more minimal example [A filled example](data/dataset_processing_example_skeleton.R) as a intermediate step. You can look at the [filled example](data/dataset_processing_example_filled.R) if you get stuck or message the #datateam.
1818

19+
In addition, you may find this [cheat sheet](https://docs.google.com/document/d/1DPhCmnxhoSWv5FEHvlIiNRBjcuFbKqwZviaZDf8UfVU/edit?usp=sharing) of data team R functions helpful.
20+
1921
Once you have updated the dataset to your satisfaction and reviewed the Final Checklist, post the link to the dataset on #datateam for peer review.
2022

2123
```{r, child = '../workflows/pi_correspondence/final_review_checklist.Rmd'}

training/index.Rmd

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ favicon: "favicon.ico"
3939
* If you are an intern, fill out anticipated quarterly schedule on the intern google calendar shared with you.
4040
* <a href="https://timekeeping.ucsb.edu/" target="_blank">Electronic Timekeeping</a> - make sure you can log on to electronic timekeeping
4141
via your UCSBNetID and password (may not be accessible on the first day, if you continue to have issues please let Ana know). If you are an hourly employee, log your hours for your first day! Under today's date select 'Hours Worked' under the Pay Code column, enter the amount of hours under the Amount column, and finally click the 'Save' button in the top right. At the end of every two-week pay period you will also need to click the 'Approve Timecard' button in order to submit your timecard.
42+
* <a href="https://www.ucpath.ucsb.edu/" target="_blank">UCPath</a> - you can set up your paycheck preferences here, including direct deposit and income tax withholding.
4243
<a href="https://timekeeping.ucsb.edu/sites/default/files/employee_hours_worked_0.pdf" target="_blank">Detailed Instructions</a>
4344
* Let Jeanette or Daphne know what email you would like to use for general NCEAS updates from [email protected]
4445

@@ -47,8 +48,9 @@ via your UCSBNetID and password (may not be accessible on the first day, if you
4748
NCEAS hosts a number of events that you are encouraged to attend. Keep an eye on your email but the recurring events are:
4849

4950
* Roundtable
50-
+ weekly presentation and discussion of research by a visiting or local scientist
51-
+ Wednesdays at 12:15 in the lounge
51+
+ presentation and discussion of research by a visiting or local scientist
52+
+ first or second Thursdays of the month at 3:30 in the lounge and via zoom
53+
+ followed by happy hour at 4:30 on the terrace
5254
* Coffee Klatch
5355
+ coffee, socializing, and news updates for NCEAS
5456
+ Tuesdays at 10:30 in the lounge
@@ -64,7 +66,9 @@ Check out their individual calendar entries and channels for more information
6466
* NCEAS Book Club - #bookclub
6567

6668
## Internship Expectations {-}
67-
As an intern with the data team, there are a few expectations that the Project Coordinators have of you. Overall, we expect you to be communicative and proactive. We want you to learn and grow in this position, but we don't want you spinning your wheels going nowhere fast! If you've spent 10-15 minutes on an issue and you're not making any progress, reach out to us and your peers for help in the #datateam slack channel. The #datateam slack channel is the main form of communication, and we expect all interns to become comfortable communicating in this space.
69+
As an intern with the data team, there are a few expectations that the Project Coordinators have of you. Overall, we expect you to be communicative and proactive. We want you to learn and grow in this position, but we don't want you spinning your wheels going nowhere fast! If you've spent 10-15 minutes on an issue and you're not making any progress, reach out to us and your peers for help in the #datateam slack channel.
70+
71+
The #datateam slack channel is our main form of communication, and we expect all interns to become comfortable communicating in this space. By posting your questions and code in the #datateam channel (instead of sending direct messages), multiple people will be able to help at once, and we all can learn from the problems that our peers encounter.
6872

6973
Additionally, we expect interns to work within the standard business hours of 8am - 5pm (pacific time). We ask that you mark your expected work hours on the shared "Intern" Google Calendar. This is so that the Project Coordinators are aware of who's working day-to-day and can plan their days accordingly. We also use this to verify time sheets when they are submitted. Ideally, interns would input their proposed hours on the calendar at least one week in advance. During exams and other unusually busy weeks at school, we understand you may need to shift your hours or reduce your workload. When this occurs, please make sure to email either Daphne or Jeanette so that we know not to expect you during your usual schedule.
7074

workflows/data_packages_arcticdatautils/reorder_entities.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ This is easier to accomplish using `arcticdatautils`
1010
doc$dataset$otherEntity <- doc$dataset$otherEntity[order(entity_names)]
1111
```
1212

13-
2. Data files
13+
2. Data files and PIDs retrieved with `get_package`
1414

1515
```{r eval = F}
1616
pkg <- get_package(adc, rm, file_names = T)

workflows/data_portals/mosaic.Rmd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ Look out for datasets that are part of the MOSAiC expedition from 2019 -2020. Th
88

99
> We would like to ask for the event label associated with this dataset (see https://www.pangaea.de/expeditions/events/PS122%2F4).
1010
11+
Often times researchers do not have a full list of event labels, but instrument names are provided in the dataset metadata. In these cases, try to find the correct event labels by searching through the event lists and finding matching instrument names and dates. Each MOSAiC [campaign](https://www.pangaea.de/expeditions/byproject/MOSAiC) has its own event list. [Here](https://www.pangaea.de/expeditions/events/PS122%2F1) is an example of an event list for the first cruise by the Polarstern research vessel.
12+
1113
2. Find the appropriate dataset and attribute level annotations
1214

1315
- There are functions in `arcticdatautils` to help with annotating: `mosaic_annotate_dataset` and `mosaic_annotate_attribute`
@@ -27,7 +29,7 @@ The following shows how to add the annotations using `arcticdatautils` and manua
2729

2830
### Dataset Level Annotations
2931

30-
There are 5 main campaigns in the MOSAiC expedition. The main campaigns follow the pattern `PS122/#`. For the full campaign list it is easiest to see on the [PANGAEA website](https://www.pangaea.de/expeditions/byproject/MOSAiC)
32+
There are 5 main campaigns in the MOSAiC expedition. The main campaigns follow the pattern `PS122/#`. The full campaign list is easiest to view on the [PANGAEA website](https://www.pangaea.de/expeditions/byproject/MOSAiC). The first two letters of each campaign name correspond with the ship or station name (ex: PS = Polarstern).
3133

3234
**arcticdatautils**
3335
```{r, eval = F}

workflows/edit_data_packages/01_datapack_background.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
## datapack Background
2-
*adapted from the dataone and datapack vingettes*
2+
*adapted from the dataone and datapack vignettes*
33

44
`datapack` is written differently than most R packages you may have encountered in the past. This is because it uses the [S4](https://adv-r.hadley.nz/s4.html) system instead.
55

workflows/edit_data_packages/set_rights_and_access.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ set_rights_and_access(mn,
2929
permissions = c('read','write','changePermission'))
3030
```
3131

32-
If you ever need to remove/add public access to your package or object, you can use `remove_public_read()` or `set_public_read()`, respectively.
32+
If you ever need to remove/add public access to your package or object, you can use `remove_public_read()` or `set_public_read()`, respectively. Making files publicly readable is especially useful when downloading large amounts of files to the server in order to to use metadata helper functions that require a file path (ex: `eml_get_raster_metadata()` and `get_ncdf4_attributes()`).
3333

3434
```{r, eval = FALSE}
3535
remove_public_read(mn, c(pkg$metadata, pkg$data, pkg$resource_map))

workflows/edit_eml/edit_attributelists.Rmd

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,9 +74,9 @@ attributes <- data.frame(
7474
missingValueCodeExplanation = c(NA, NA, NA,NA, NA, NA, NA, 'no sampling comments'))
7575
```
7676

77-
However, typing this out in R can be a major pain. Luckily, there's an app that you can use to build attribute information. You can use the app to build attributes from a data file loaded into R (recommended as the app will auto-fill some fields for you) to edit an existing attribute table, or to create attributes from scratch.
77+
However, typing this out in R can be a major pain. Luckily, there's a Shiny app that you can use to build attribute information. You can use the app to build attributes from a data file loaded into R (recommended as the app will auto-fill some fields for you) to edit an existing attribute table, or to create attributes from scratch. Use the following commands to create or modify attributes.
7878

79-
Use the following commands to create or modify attributes. These commands will launch a "Shiny" app in your web browser. You must select "Quit App" in order to save your changes, and R will not run code while the app is open.
79+
Use the following commands to create or modify attributes. These commands will launch a "Shiny" app in your web browser.
8080

8181
```{r, eval = FALSE}
8282
#first download the CSV in your data package from Exercise #2
@@ -96,7 +96,11 @@ attribute_tables <- get_attributes(doc$dataset$dataTable[[i]]$attributeList)
9696
attribute_tables <- EML::shiny_attributes(attributes = attribute_tables$attributes)
9797
```
9898

99-
Once you are done editing a table in the app, quit the app and the tables will be assigned to the `attribute_tables` variable as a list of data frames (one for attributes, factors, and units). Be careful to not overwrite your completed `attribute_tables` object when trying to make edits. The last line of code can be used in order to make edits to an existing `attribute_tables` object.
99+
Once you are done editing a table in the browser app, quit the app by pressing the red "Quit App" button in the top right corner of the page.
100+
101+
If you close the Shiny app tab in your browser instead of using the "Quit App" button, your work will not be saved, R will think that the Shiny app is still open, and you will not be able to run other code. You can tell if R is confused if you have closed the Shiny app and the bottom line in the console still says `Listening on http://...`. If this happens, press the red stop sign button on the right hand side of the console window in order to interrupt R.
102+
103+
The tables you constructed in the app will be assigned to the `attribute_tables` variable as a list of data frames (one for attributes, factors, and units). Be careful to not overwrite your completed `attribute_tables` object when trying to make edits. The last line of code can be used in order to make edits to an existing `attribute_tables` object.
100104

101105
Alternatively, each table can be to exported to a csv file by clicking the `Download` button. If you downloaded the table, read the table back into your R session and assign it to a variable in your script (e.g. `attributes <- data.frame(...)`), or just use the variable that `shiny_attributes` returned.
102106

0 commit comments

Comments
 (0)