Skip to content

Commit 34ad8a3

Browse files
authored
Merge pull request #77 from NIEHS/dariusmb_0219
Dariusmb 0325
2 parents ab0620e + 0b4ff24 commit 34ad8a3

File tree

2 files changed

+60
-7
lines changed

2 files changed

+60
-7
lines changed

chapters/05-01-hcup-amadeus-usecase.Rmd

Lines changed: 60 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### Integrating HCUP databases with Amadeus Exposure data {.unnumbered}
66

7-
**Date Modified**: February 19, 2025
7+
**Date Modified**: March 22, 2025
88

99
**Author**: Darius M. Bost
1010

@@ -18,7 +18,7 @@ knitr::opts_chunk$set(warning = FALSE, message = FALSE)
1818

1919
## Motivation
2020

21-
Understanding the relationship between external environmental factors and health outcomes is critical for guiding public health strategies and policy decisions. Integrating Healthcare Cost and Utilization Project (HCUP) data with environmental datasets allows researchers to examine how elements such as air quality, wildfire emissions, and extreme temperatures impact hospital visits and healthcare utilization patterns.
21+
Understanding the relationship between external environmental factors and health outcomes is critical for guiding public health strategies and policy decisions. Integrating individual patient records from the Healthcare Cost and Utilization Project (HCUP) with data from environmental datasets allows researchers to examine how elements such as air quality, wildfire emissions, and extreme temperatures impact hospital visits and healthcare utilization patterns.
2222

2323
Ultimately, linking HCUP and environmental exposure data enhances public health monitoring and helps researchers better quantify environmental health risks.
2424

@@ -38,8 +38,8 @@ This tutorial includes the following steps:
3838

3939
```{r eval = FALSE}
4040
# install required packages
41-
install.packages(c("readr", "data.table", "sf", "tidyverse", "tigris",
42-
"dplyr", "amadeus"))
41+
install.packages(c("readr", "data.table", "sf", "tidyverse", "tigris", "dplyr",
42+
"amadeus"))
4343
4444
# load required packages
4545
library(readr)
@@ -53,6 +53,8 @@ library(dplyr)
5353

5454
## Data Curation and Prep {#link-to-hcupAmadeus-1}
5555

56+
HCUP data files from AHRQ can be obtained from their [HCUP Central Distributor](https://cdors.ahrq.gov/) site where it details the data use aggreement, how to purchase, protect, re-use and share the HCUP data.
57+
5658
Upon acquistion of HCUP database files, you will notice that the state files are distributed as large ASCII text files. These files contain the raw data and can be very large, as they store all of the individual records for hospital stays or procedures. ARHQ provides SAS software tools to assist with loading the data into [SAS](https://hcup-us.ahrq.gov/tech_assist/software/508course.jsp#structure) for analysis, however, this doesn't help when using other coding languages like R. To solve this we utilize the .loc files (also provided on HCUP website), the year of the data and the type of data file being loaded.
5759

5860
We will start with State level data: State Inpatient Database (SID), State Emergency Department Database (SEDD), and State Ambulatory Surgery and Services Database(SASD).
@@ -99,7 +101,9 @@ for (data_source in data_sources) {
99101
)
100102
}
101103
```
102-
The `fwf_positions()` function is utilizing column start and end positions found on the ahrq website (`meta_url` listed in next code chunk). We use these positions to read in the raw data files from their .asc format.
104+
105+
The `fwf_positions()` function is utilizing column start and end positions found on the ahrq website (`meta_url` listed in next code chunk). We use these positions to read in the raw data files from their .asc format.
106+
103107
::: figure
104108
<img src="images/hcup_amadeus_usecase/oregon2021_SEDD_core_loc_file.png" style="width:100%"/>
105109

@@ -152,9 +156,27 @@ print(df)
152156
# ℹ Use `print(n = ...)` to see more rows
153157
```
154158

155-
## Downloading and Processing Exposure Data with the `amadeus` Package {#link-to-hcupAmadeus-2}
159+
### Confirming data Characteristics
160+
161+
We now have or CSV formatted file `OR_SEDD_2021_CORE.csv`. We can check summary statistics for our data by going to HCUP databases page [here](https://hcup-us.ahrq.gov/databases.jsp).
162+
163+
To get to the summary table for our data we do the following:
164+
165+
1. Click on the link for SEDD.
166+
167+
2. Next we find the section on Data Elements
168+
169+
::: figure
170+
<img src="images/hcup_amadeus_usecase/hcup_data_elements.png" style="width:100%"/>
171+
:::
172+
173+
3. From here we want to select `Summary Statistics for All States and Years`. This will redirect us to a page that allows the selection of our database attributes (state, year, database choice)
156174

157-
This section provides a step-by-step guide to downloading and processing wildfire smoke exposure data using the `amadeus` package. The process includes retrieving Hazard Mapping System (HMS) smoke plume data, spatially joining it with ZIP Code Tabulation Areas (ZCTAs) for Oregon, and calculating summary statistics on smoke density.
175+
4. Lastly we select the file we're interested in. In our example we downloaded the CORE file so that summary table will look like [this](https://hcup-us.ahrq.gov/db/state/seddc/tools/cdstats/OR_SEDDC_2021_SummaryStats_CORE.PDF).
176+
177+
## Downloading and Processing Exposure Data with the [`amadeus`](https://niehs.github.io/amadeus/) Package {#link-to-hcupAmadeus-2}
178+
179+
This section provides a step-by-step guide to downloading and processing wildfire smoke exposure data using the `amadeus` package. The process includes retrieving [Hazard Mapping System (HMS) smoke plume data](https://www.ospo.noaa.gov/products/land/hms.html#0), spatially joining it with ZIP Code Tabulation Areas (ZCTAs) for Oregon, and calculating summary statistics on smoke density.
158180

159181
### Step 1: Define Time Range
160182

@@ -215,3 +237,34 @@ temp_covar <- calculate_hms(
215237
# Save processed data
216238
saveRDS(temp_covar, "smoke_plume2021_covar.R")
217239
```
240+
241+
In preparation for the next section we are going to make two new dataframes from our `temp_covar` object. The first collapses our zipcodes taking the average of light, medium, or heavy days.
242+
243+
```{r eval=FALSE}
244+
avg_smoke_density <- temp_covar %>%
245+
group_by(ZCTA5CE10) %>%
246+
summarise(
247+
avg_light = mean(light_00000, na.rm = TRUE),
248+
avg_medium = mean(medium_00000, na.rm = TRUE),
249+
avg_heavy = mean(heavy_00000, na.rm = TRUE),
250+
)
251+
print(avg_smoke_density)
252+
saveRDS(avg_smoke_density, "smoke_density_avg_byZip.R")
253+
```
254+
255+
The second dataframe also groups by our zip but takes the summation of the smoke plume days instead of an average.
256+
257+
```{r eval=FALSE}
258+
total_smoke_density <- temp_covar %>%
259+
group_by(ZCTA5CE10) %>%
260+
summarise(
261+
sum_light = sum(light_00000, na.rm = TRUE),
262+
sum_medium = sum(medium_00000, na.rm = TRUE),
263+
sum_heavy = sum(heavy_00000, na.rm = TRUE),
264+
geometry = st_union(geometry)
265+
)
266+
print(total_smoke_density)
267+
saveRDS(total_smoke_density, "smoke_density_total_byZip.R")
268+
```
269+
270+
## Data Analysis using HCUP and Amadeus data sources
Loading

0 commit comments

Comments
 (0)