Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify number of samples taken in the resample function #17

Open
tompollard opened this issue Aug 24, 2022 · 0 comments
Open

Clarify number of samples taken in the resample function #17

tompollard opened this issue Aug 24, 2022 · 0 comments

Comments

@tompollard
Copy link
Collaborator

In https://github.com/carpentries-incubator/machine-learning-novice-python/blob/gh-pages/_episodes/07-bootstrapping.md, the following chunk is used to resample the datasets for bootstrapping:

X_bs, y_bs = resample(x_train, y_train, replace=True)

The number of samples isn't specified in the function call, so it is unclear how many samples are being taken.

According to the documentation at https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html the number of samples is specified in the n_samples argument:

"n_samples int, default=None
Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays. If replace is False it should not be larger than the length of arrays."

By default resample will use the length of the array as the number of samples. We should either: (1) note this default or (2) provide the n_samples argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant