Skip to content

Dariusmb 0325 #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 60 additions & 7 deletions chapters/05-01-hcup-amadeus-usecase.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

### Integrating HCUP databases with Amadeus Exposure data {.unnumbered}

**Date Modified**: February 19, 2025
**Date Modified**: March 22, 2025

**Author**: Darius M. Bost

Expand All @@ -18,7 +18,7 @@ knitr::opts_chunk$set(warning = FALSE, message = FALSE)

## Motivation

Understanding the relationship between external environmental factors and health outcomes is critical for guiding public health strategies and policy decisions. Integrating Healthcare Cost and Utilization Project (HCUP) data with environmental datasets allows researchers to examine how elements such as air quality, wildfire emissions, and extreme temperatures impact hospital visits and healthcare utilization patterns.
Understanding the relationship between external environmental factors and health outcomes is critical for guiding public health strategies and policy decisions. Integrating individual patient records from the Healthcare Cost and Utilization Project (HCUP) with data from environmental datasets allows researchers to examine how elements such as air quality, wildfire emissions, and extreme temperatures impact hospital visits and healthcare utilization patterns.

Ultimately, linking HCUP and environmental exposure data enhances public health monitoring and helps researchers better quantify environmental health risks.

Expand All @@ -38,8 +38,8 @@ This tutorial includes the following steps:

```{r eval = FALSE}
# install required packages
install.packages(c("readr", "data.table", "sf", "tidyverse", "tigris",
"dplyr", "amadeus"))
install.packages(c("readr", "data.table", "sf", "tidyverse", "tigris", "dplyr",
"amadeus"))

# load required packages
library(readr)
Expand All @@ -53,6 +53,8 @@ library(dplyr)

## Data Curation and Prep {#link-to-hcupAmadeus-1}

HCUP data files from AHRQ can be obtained from their [HCUP Central Distributor](https://cdors.ahrq.gov/) site where it details the data use aggreement, how to purchase, protect, re-use and share the HCUP data.

Upon acquistion of HCUP database files, you will notice that the state files are distributed as large ASCII text files. These files contain the raw data and can be very large, as they store all of the individual records for hospital stays or procedures. ARHQ provides SAS software tools to assist with loading the data into [SAS](https://hcup-us.ahrq.gov/tech_assist/software/508course.jsp#structure) for analysis, however, this doesn't help when using other coding languages like R. To solve this we utilize the .loc files (also provided on HCUP website), the year of the data and the type of data file being loaded.

We will start with State level data: State Inpatient Database (SID), State Emergency Department Database (SEDD), and State Ambulatory Surgery and Services Database(SASD).
Expand Down Expand Up @@ -99,7 +101,9 @@ for (data_source in data_sources) {
)
}
```
The `fwf_positions()` function is utilizing column start and end positions found on the ahrq website (`meta_url` listed in next code chunk). We use these positions to read in the raw data files from their .asc format.

The `fwf_positions()` function is utilizing column start and end positions found on the ahrq website (`meta_url` listed in next code chunk). We use these positions to read in the raw data files from their .asc format.

::: figure
<img src="images/hcup_amadeus_usecase/oregon2021_SEDD_core_loc_file.png" style="width:100%"/>

Expand Down Expand Up @@ -152,9 +156,27 @@ print(df)
# ℹ Use `print(n = ...)` to see more rows
```

## Downloading and Processing Exposure Data with the `amadeus` Package {#link-to-hcupAmadeus-2}
### Confirming data Characteristics

We now have or CSV formatted file `OR_SEDD_2021_CORE.csv`. We can check summary statistics for our data by going to HCUP databases page [here](https://hcup-us.ahrq.gov/databases.jsp).

To get to the summary table for our data we do the following:

1. Click on the link for SEDD.

2. Next we find the section on Data Elements

::: figure
<img src="images/hcup_amadeus_usecase/hcup_data_elements.png" style="width:100%"/>
:::

3. From here we want to select `Summary Statistics for All States and Years`. This will redirect us to a page that allows the selection of our database attributes (state, year, database choice)

This section provides a step-by-step guide to downloading and processing wildfire smoke exposure data using the `amadeus` package. The process includes retrieving Hazard Mapping System (HMS) smoke plume data, spatially joining it with ZIP Code Tabulation Areas (ZCTAs) for Oregon, and calculating summary statistics on smoke density.
4. Lastly we select the file we're interested in. In our example we downloaded the CORE file so that summary table will look like [this](https://hcup-us.ahrq.gov/db/state/seddc/tools/cdstats/OR_SEDDC_2021_SummaryStats_CORE.PDF).

## Downloading and Processing Exposure Data with the [`amadeus`](https://niehs.github.io/amadeus/) Package {#link-to-hcupAmadeus-2}

This section provides a step-by-step guide to downloading and processing wildfire smoke exposure data using the `amadeus` package. The process includes retrieving [Hazard Mapping System (HMS) smoke plume data](https://www.ospo.noaa.gov/products/land/hms.html#0), spatially joining it with ZIP Code Tabulation Areas (ZCTAs) for Oregon, and calculating summary statistics on smoke density.

### Step 1: Define Time Range

Expand Down Expand Up @@ -215,3 +237,34 @@ temp_covar <- calculate_hms(
# Save processed data
saveRDS(temp_covar, "smoke_plume2021_covar.R")
```

In preparation for the next section we are going to make two new dataframes from our `temp_covar` object. The first collapses our zipcodes taking the average of light, medium, or heavy days.

```{r eval=FALSE}
avg_smoke_density <- temp_covar %>%
group_by(ZCTA5CE10) %>%
summarise(
avg_light = mean(light_00000, na.rm = TRUE),
avg_medium = mean(medium_00000, na.rm = TRUE),
avg_heavy = mean(heavy_00000, na.rm = TRUE),
)
print(avg_smoke_density)
saveRDS(avg_smoke_density, "smoke_density_avg_byZip.R")
```

The second dataframe also groups by our zip but takes the summation of the smoke plume days instead of an average.

```{r eval=FALSE}
total_smoke_density <- temp_covar %>%
group_by(ZCTA5CE10) %>%
summarise(
sum_light = sum(light_00000, na.rm = TRUE),
sum_medium = sum(medium_00000, na.rm = TRUE),
sum_heavy = sum(heavy_00000, na.rm = TRUE),
geometry = st_union(geometry)
)
print(total_smoke_density)
saveRDS(total_smoke_density, "smoke_density_total_byZip.R")
```

## Data Analysis using HCUP and Amadeus data sources
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading