-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
09_tabular: ProductSize histogram's y-axis is mislabeled #590
Comments
Looks like a fix was submitted in pull request #410. |
I can confirm this issue; I ran into it while doing my own notes. My fix was as follows: p = valid_xs_final['ProductSize'].value_counts(sort=False).sort_index().plot.barh()
c = to.classes['ProductSize']
plt.yticks(range(len(c)), c); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
The book's histogram of ProductSizes in the "Partial Dependence" section has a mislabeled y-axis. Consequently, the histogram communicates the wrong counts for some of the ProductSizes. Here are some ProductSizes it mislabeled:
See below for details.
Book's incorrect histogram
The "Partial Dependence" section has a ProductSize histogram that is produced by this code:
and renders like this:
Corrected histogram
We can reveal the mistake in the book's histogram by inspecting a textual histogram from the dataframe:
That code produces this textual histogram:
See the table at the top of this issue for a comparison between the counts of these ProductSizes and the ones from the book's histogram.
Cause
The problem is that the code that labels the y-axis assumes that the bottom bar is ProductSize 0, the next bar is ProductSize 1, etc. but this isn't the case. The bars do not appear to be ordered by ProductSize.
Example fix
Here's some code that properly labels the y-axis by sorting the y-axis labels to match the order of the bars:
The text was updated successfully, but these errors were encountered: