-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Incidence-to-Prevalence instruction set needs a bit of refinement #168
Comments
Re-running the prompt above gave me code with incorrect logic unfortunately: import pandas as pd
# Convert 'date' columns in all dataframes to datetime for proper manipulation
d1['date'] = pd.to_datetime(d1['date'])
d2['date'] = pd.to_datetime(d2['date'])
d3['date'] = pd.to_datetime(d3['date'])
# Sort dataframes by date to ensure correct rolling calculations
d1 = d1.sort_values('date')
d2 = d2.sort_values('date')
d3 = d3.sort_values('date')
# Calculate 'Infected' (I) category
d1['I'] = d1['value'] - d1['value'].shift(14)
d1['I'] = d1['I'].fillna(0).apply(lambda x: max(x, 0.0))
# Calculate 'Recovered' (R) category
d1['R'] = d1['value'].shift(14)
d1['R'] = d1['R'].fillna(0).apply(lambda x: max(x, 0.0))
# Calculate 'Hospitalized' (H) category
d2['H'] = d2['value'] - d2['value'].shift(10)
d2['H'] = d2['H'].fillna(0).apply(lambda x: max(x, 0.0))
# 'Dead' (D) category directly comes from d3
d3['D'] = d3['value'].apply(lambda x: max(x, 0.0))
# Calculate 'Susceptible' (S) category
# First, ensure we have a common date range for all calculations
common_dates = set(d1['date']) & set(d2['date']) & set(d3['date'])
df_I = d1[d1['date'].isin(common_dates)][['date', 'I']]
df_R = d1[d1['date'].isin(common_dates)][['date', 'R']]
df_H = d2[d2['date'].isin(common_dates)][['date', 'H']]
df_D = d3[d3['date'].isin(common_dates)][['date', 'D']]
# Merge dataframes on 'date' to align all calculations
df_merged = pd.merge(pd.merge(pd.merge(df_I, df_R, on='date'), df_H, on='date'), df_D, on='date')
# Calculate 'Susceptible' (S)
total_population = 320e6
df_merged['S'] = total_population - df_merged['I'] - df_merged['R'] - df_merged['H'] - df_merged['D']
df_merged['S'] = df_merged['S'].apply(lambda x: max(x, 0.0))
# Selecting the final dataframe to display the compartmentalized data
df_final = df_merged[['date', 'S', 'I', 'R', 'H', 'D']]
# Display the head of the final dataframe to verify the calculations
print(df_final.head()) |
@liunelson looking into these....in the first comment, is that code correct? If not, do you have the corrected code by any chance? |
@liunelson are you testing with these datasets:
The issue is that they have a bunch of locations so need to be filtered first (e.g. to just |
@liunelson testing with those datasets I referenced above (staging) everything seems to work for me just fine once I filter for location. |
There are some minor bugs in the code generated from the incidence-to-prevalence prompt.
I had to edit the prompt and code a bit to get the right behaviour:
The code is:
The particular edits are:
min_periods = 1
on the rolling sums to avoidnp.na
showingI, H
variablesfill_value = 0.0
to the cumulative sum forR
The text was updated successfully, but these errors were encountered: