Skip to content

Commit fd178a1

Browse files
authored
Merge pull request #81 from NIEHS/dariusmb_0219
Dariusmb 0418
2 parents ac54ae9 + c9aa5b9 commit fd178a1

File tree

2 files changed

+48
-14
lines changed

2 files changed

+48
-14
lines changed

chapters/05-01-hcup-amadeus-usecase.Rmd

Lines changed: 48 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### Integrating HCUP databases with Amadeus Exposure data {.unnumbered}
66

7-
**Date Modified**: March 22, 2025
7+
**Date Modified**: April 18, 2025
88

99
**Author**: Darius M. Bost
1010

@@ -99,7 +99,8 @@ for (data_source in data_sources) {
9999
start = c(1, 5, 10, 27, 31, 63, 68, 73, 75, 80),
100100
end = c(3, 8, 25, 29, 61, 66, 71, 73, 78, NA) # NA for ragged column
101101
)
102-
}
102+
} #Ends if statement
103+
# 'data_source in data_sources' and 'year in years' loop continues below
103104
```
104105

105106
The `fwf_positions()` function is utilizing column start and end positions found on the ahrq website (`meta_url` listed in next code chunk). We use these positions to read in the raw data files from their .asc format.
@@ -115,11 +116,18 @@ The `fwf_positions()` function is utilizing column start and end positions found
115116
meta_url <- paste0("https://hcup-us.ahrq.gov/db/state/",
116117
data_source_lower_c, "/tools/filespecs/OR_",
117118
data_source, "_", year, "_", data_type, ".loc")
119+
120+
# Skip the first 20 lines because they contain header information and
121+
# descriptions, not column metadata
118122
df <- readr::read_fwf(meta_url, positions, skip = 20)
119123
# Read data
120-
124+
# Set directory to location where HCUP ASCII file was downloaded
125+
# Users should replace "../OR/" with their own download path
121126
data_file <- paste0("../OR/", data_source, "/OR_", data_source, "_",
122127
year, "_", data_type, ".asc")
128+
# fwf_positions are passed column positions from df (file specifications)
129+
# file. Ex. df$X5 has all the column names for our meta_data. See print(df)
130+
# below.
123131
df2 <- readr::read_fwf(
124132
data_file,
125133
readr::fwf_positions(start = df$X6, end = df$X7, col_names = df$X5),
@@ -130,8 +138,8 @@ The `fwf_positions()` function is utilizing column start and end positions found
130138
# Write output CSV
131139
output_file <- paste0("OR_", data_source, "_", year, "_", data_type, ".csv")
132140
write.csv(df2, file = output_file, row.names = FALSE)
133-
}
134-
}
141+
} # Ends 'year in years' for loop
142+
} # Ends 'data_source in data_sources' for loop
135143
#Output file: OR_SEDD_2021_CORE.csv
136144
```
137145

@@ -214,9 +222,9 @@ Once the raw HMS data is downloaded, we process it using `process_hms()`. This f
214222

215223
```{r eval=FALSE}
216224
cov_h <- process_hms(
217-
date = time_range, # Specify the date range
225+
date = time_range, # Specify the date range
218226
path = "./data/data_files/", # Path to the downloaded data files
219-
extent = sf::st_bbox(or) # Limit processing to Oregon's spatial extent
227+
extent = sf::st_bbox(or) # Limit processing to Oregon's spatial extent
220228
)
221229
```
222230

@@ -226,18 +234,40 @@ Using `calculate_hms()`, we extract wildfire smoke plume values at the ZIP code
226234

227235
```{r eval=FALSE}
228236
temp_covar <- calculate_hms(
229-
covariate = "hms", # Specify the covariate type
237+
covariate = "hms", # Specify the covariate type
230238
from = cov_h, # Use the processed HMS data
231-
locs = tigris::zctas(state = "OR", year = 2010), # Use Oregon ZIP code bounds
239+
locs = or, # Use Oregon ZIP code bounds
232240
locs_id = "ZCTA5CE10", # Define ZIP code identifier
233-
radius = 0, # No buffer radius
234-
geom = "sf" # Return as an sf object
241+
radius = 0, # No buffer radius
242+
geom = "sf" # Return as an sf object
235243
)
236244
237245
# Save processed data
238246
saveRDS(temp_covar, "smoke_plume_covar.R")
239247
```
240248

249+
```{r eval=FALSE}
250+
glimpse(temp_covar)
251+
# Rows: 12,989
252+
# Columns: 16
253+
# $ STATEFP10 <chr> "41", "41", "41", "41", "41", "41", "41", "41", "41", "…
254+
# $ ZCTA5CE10 <chr> "97833", "97840", "97330", "97004", "97023", "97042", "…
255+
# $ GEOID10 <chr> "4197833", "4197840", "4197330", "4197004", "4197023", …
256+
# $ CLASSFP10 <chr> "B5", "B5", "B5", "B5", "B5", "B5", "B5", "B5", "B5", "…
257+
# $ MTFCC10 <chr> "G6350", "G6350", "G6350", "G6350", "G6350", "G6350", "…
258+
# $ FUNCSTAT10 <chr> "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", …
259+
# $ ALAND10 <dbl> 228152974, 295777905, 199697439, 113398767, 330220870, …
260+
# $ AWATER10 <dbl> 0, 10777783, 814864, 71994, 2345079, 85543, 58021, 9206…
261+
# $ INTPTLAT10 <chr> "+44.9288886", "+44.8847111", "+44.6424890", "+45.25496…
262+
# $ INTPTLON10 <chr> "-118.0148791", "-116.9184395", "-123.2562655", "-122.4…
263+
# $ PARTFLG10 <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", …
264+
# $ time <dttm> 2021-07-01, 2021-07-01, 2021-07-01, 2021-07-01, 2021-0…
265+
# $ light_00000 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
266+
# $ medium_00000 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
267+
# $ heavy_00000 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
268+
# $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-118.1575 4..., MULTIPOLYG…
269+
```
270+
241271
In preparation for the next section we are going to make two new dataframes from our `temp_covar` object. The first collapses our zipcodes taking the average of light, medium, or heavy days.
242272

243273
```{r eval=FALSE}
@@ -263,7 +293,7 @@ saveRDS(avg_smoke_density, "smoke_density_avg_byZip.R")
263293
# 6 97042 0.258 0.129 0.0323
264294
```
265295

266-
The second dataframe also groups by our zip but takes the summation of the smoke plume days instead of an average.
296+
The second dataframe also groups by our zip but takes the summation of the smoke plume days instead of an average. We will keep the geometry with this dataframe as we will want to keep it for our merger later on. If we kept it for both dataframes, we would have repeating columns after our hcup/amadeus merge.
267297

268298
```{r eval=FALSE}
269299
total_smoke_density <- temp_covar %>%
@@ -291,7 +321,7 @@ saveRDS(total_smoke_density, "smoke_density_total_byZip.R")
291321

292322
## Data Analysis using HCUP and Amadeus data sources
293323

294-
First we will load in our hcup data file we processed earlier and subset the file to a set of observations that make the data easier to work with (702 to 39 columns) and are still interesting for analysis. This includes zipcodes, age at admission, admission month, race identifier, sex, and ICD 10 diagnosis codes.
324+
First we will load in our hcup data file we processed earlier and subset the file to a set of observations that make the data easier to work with (702 to 39 columns) and are still interesting for analysis. This includes zipcodes (ZIP), age at admission (AGE), admission month (AMONTH), race identifier (RACE), sex (FEMALE), and ICD 10 diagnosis codes (I10\_).
295325

296326
```{r eval=FALSE}
297327
or_sedd_2021 <- fread("OR_SEDD_2021_CORE.csv")
@@ -364,7 +394,7 @@ total_individuals <- nrow(smoke_summary)
364394
asthma_cases <- sum(smoke_summary$has_asthma, na.rm = TRUE)
365395
366396
# Calculate the proportion of individuals diagnosed with asthma
367-
asthma_rate <- asthma_cases / total_individuals
397+
asthma_prevalence <- asthma_cases / total_individuals
368398
```
369399

370400
### Visualizing the Relationship Between Heavy Smoke Exposure and Asthma
@@ -384,6 +414,10 @@ ggplot(smoke_summary, aes(x = factor(has_asthma), y = avg_heavy,
384414
theme_minimal()
385415
```
386416

417+
::: figure
418+
<img src="images/hcup_amadeus_usecase/asthma_vs_heavy_smoke.png"style="width:100%"/>
419+
:::
420+
387421
### Logistic Regression Analysis
388422

389423
Finally, we fit a logistic regression model to examine the relationship between asthma diagnoses and exposure to different levels of smoke density.
Loading

0 commit comments

Comments
 (0)