Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caseload size calculation #44

Open
ChrisBeeley opened this issue Sep 25, 2022 · 7 comments
Open

Add caseload size calculation #44

ChrisBeeley opened this issue Sep 25, 2022 · 7 comments
Assignees

Comments

@ChrisBeeley
Copy link
Member

Using this script to produce example data (or submitting your own example data) please add a function that calculates the caseload size per team (or teams) by day (or range of days).

See the README for an explanation of the data.

We would love multiple solutions to this please- tidyverse, base, data.table, SQL, Python, you name it 🙂

@ChrisBeeley
Copy link
Member Author

PS I banged out the synthetic data, if anybody has improvements or comments I'd be glad to hear them

@NHSmatthewcallus
Copy link

This is my attempt using SQL:

caseload_size_MC.txt

@LeslieHewitt
Copy link

This works for me:

`-- Caseload

declare @StartDate date = '2022-08-31'
declare @EndDate date = getdate()

;with cteCaseload as (
select IDPatient
,CONVERT(date,ref.[DateReferral]) as ReferralDate
,ref.[Discharge_Date]
,TeamName
,1 as Caseload
FROM MyReferralsTable ref
where ref.[DateReferral]<@EndDate
and (ref.Discharge_Date>=@StartDate OR ref.Discharge_Date is null)
)

select c.IDPatient
,d.Date as EventDate
,c.TeamName
,1 as Caseload
from MyDateTable d
left join cteCaseload c on c.ReferralDate <= d.Date
and (c.[Discharge_Date] >= d.Date or c.[Discharge_Date] is null)
where d.Date >= @StartDate
and d.Date <= @EndDate

`

@jontix
Copy link

jontix commented Sep 28, 2022

This one for a dplyr version...

It uses dates from the synthetic data generation script, but that can be changed to whatever dates are required.

team_caseload <- test_frame %>%
  full_join(data.frame(dates), by = character()) %>%
  group_by(client_id, team_desc, dates) %>%
  summarise(is_in = max(referral_date <= dates 
                        & (discharge_date > dates | is.na(discharge_date)))) %>%
  group_by(team_desc, dates) %>%
  summarise(total_caseload_on_date = sum(is_in))

@Lextuga007
Copy link
Member

@tomjemmett
Copy link
Member

you could treat this as a continous function by pivotting the dates into a single column, then using referral_date as +1, discharge_date as -1. If you then arrange on the pivotted date column, sort by that date, we can then cumulative sum the count column to give us our caseload at any point in time. This would probably be a more interesting approach with a date time column.

There is one big caveat if you were using this, you would need to write a query that extractted date before your period of interest, e.g. .data$discharge_date > start_date | .data$referral_date < end_date | is.na(.data$discharge_date)

library(tidyverse)

add_end_row <- function(.data, end_date = NULL) {
  if (is.null(end_date)) {
    end_date <- max(.data$date)
  }
  bind_rows(.data, summarise(.data, date = end_date, across(count, last)))
}

test_frame |>
  pivot_longer(ends_with("date"), names_to = "date_type", values_to = "date") |>
  drop_na(date) |>
  mutate(count = ifelse(date_type == "referral_date", 1, -1)) |>
  arrange(date) |>
  select(team_desc, date, count) |>
  group_by(team_desc) |>
  mutate(across(count, cumsum)) |>
  add_end_row(Sys.Date()) |>
  ggplot(aes(date, count, colour = team_desc)) +
  geom_step()

@Lextuga007
Copy link
Member

The new package {ivs} https://github.com/DavisVaughan/ivs might also be useful for this YouTube 50:52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants