You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
episode 1: "their response the new drug" -> "their response to the new drug"
episode 2: This is a style thing but the sentence structure is often needlessly complicated. eg, "It is often the case that our data includes categorical values." can be simplified to "Datasets like these often include categorical values."
Similarly "In our case, for example, the binary outcome we are trying to predict - in hospital mortality - is recorded as “ALIVE” and “EXPIRED”." can be simplified to "In our case, the binary outcome we are trying to predict (hospital mortality) is recorded as ALIVE and EXPIRED".
It is extremely weird to drop the categorical outcome variable and use it as y, including the encoded numeric variable in x. I realise this is an intro lesson but this seems to me a coding mistake that would be common for novices
To avoid data leaking between our training and test sets, we take the median from the training set only. The training median is then used to impute missing values in the held-out test set.
This isn't really explained at all. The data imputation section generally is a bit short. It'd be good to mention why imputing with the median is a bad idea in arguably most cases
The text was updated successfully, but these errors were encountered:
It is extremely weird to drop the categorical outcome variable and use it as y, including the encoded numeric variable in x. I realise this is an intro lesson but this seems to me a coding mistake that would be common for novices
episode 1: "their response the new drug" -> "their response to the new drug"
episode 2: This is a style thing but the sentence structure is often needlessly complicated. eg, "It is often the case that our data includes categorical values." can be simplified to "Datasets like these often include categorical values."
Similarly "In our case, for example, the binary outcome we are trying to predict - in hospital mortality - is recorded as “ALIVE” and “EXPIRED”." can be simplified to "In our case, the binary outcome we are trying to predict (hospital mortality) is recorded as ALIVE and EXPIRED".
It is extremely weird to drop the categorical outcome variable and use it as y, including the encoded numeric variable in x. I realise this is an intro lesson but this seems to me a coding mistake that would be common for novices
This isn't really explained at all. The data imputation section generally is a bit short. It'd be good to mention why imputing with the median is a bad idea in arguably most cases
The text was updated successfully, but these errors were encountered: