-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naive bayes #68
Naive bayes #68
Conversation
Amazing! At first glance this looks great, @sfsf9797 ! I'm pretty slammed with work right now, but am going to have a look at this this weekend. |
numpy_ml/naive_bayes/naive_bayes.py
Outdated
prob = -self.n_features / 2 * np.log(2 * np.pi) - 0.5 * np.sum( | ||
np.log(sigma ) | ||
) | ||
prob -= 0.5 * np.sum(np.power(X -mean, 2) / (sigma), 1) | ||
|
||
joint_log_likelihood = prior + prob | ||
return joint_log_likelihood |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are a few errors here.
- The log Gaussian likelihood calc in
prob
isn't quite right (see later commit for details) - In the
joint_log_likelihood = prior + prob
line, you're adding a log-transformed vector (prob
) to raw probabilities (prior
), which doesn't make sense. I think what you want isnp.log(prior) + prob
. See my later commit for details. - This isn't a joint likelihood, right? You're computing the the joint class likelihood in
prob
, but when you add in the prior, this will be proportional to the log class posterior.
numpy_ml/naive_bayes/naive_bayes.py
Outdated
|
||
def prob(self,X,mean,sigma,prior): | ||
""" | ||
compute the joint log likelihood of data based on gaussian distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nomenclature: I'm not sure it's right to call this the joint log likelihood? That is, this function computes the unnormalized quantity P(y = c | X, mean_c, sigma_c), which I'd think is the class posterior.
prior = P["prior"][class_idx] | ||
sigsq = P["sigma"][class_idx] | ||
|
||
# log likelihood = log X | N(mu, sigsq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfsf9797 this is what I believe the proper log likelihood + log class posterior calc should be. Let me know if you agree!
Hey @sfsf9797 - Thanks for the PR! I just had a more thorough look and committed a few changes. Brief summary:
Please feel free to make adjustments or ask questions. Once we both agree on the implementation and are happy with the model performance, I'm happy to merge. |
Hi, thank you so much for all the feedbacks, I will go through all these and get back to you this weekend. |
sorry, I am kind of busy these few weeks, but I will get you back latest by another 2 weeks times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, true, naive Bayes is definitely under linear models when likelihood factors p(x∣c) are from exponential families.
Hi @ddbourgin thanks for correcting my implementation. I have learnt a lot from you. I am pretty satisfied with the model after a few round of checking. Lastly, Thank you a lot for the all the comments! |
Awesome, merged! Thanks for the PR @sfsf9797 :) |
Implement Gaussian naive Bayes Class and Basic documentation.
Refer to the formula of Gaussian naive Bayes from the notes
compare the performance with the gaussianNB from sci-kit learn.