-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test negative-binomial emissions/responses #12
Comments
How do we pick parameters, corresponding to Poisson and Gamma distributions, to initialize negative-binomial emissions? Is there a go-to approach for this, or is it something we have to test out? |
You're referring to the negative-binomial parameters (i.e. What you need to do is assign those parameters some prior distributions (or make the fixed constants, but that's not what we're interested in testing). To start, the supports of the priors must generally correspond with the acceptable parameter values; in this case, As a default, people will often use a conjugate distribution. This can be helpful, because then the (conditional) posterior is known and can be checked. Under the total/rate parameterization of the negative-binomial, the rate (or probability) parameter has a beta conjugate prior, but I don't know of a conjugate prior for the total (or count) parameter. You'll have to do a little research/math to determine how that translates under PyMC3's parameterization (e.g. perhaps the beta conjugate prior translates to a gamma distribution). Otherwise, one generally chooses a compatible prior that assigns some "a priori" acceptable probability to parameter values (by definition). So, for instance, if there's reason to believe that larger values of a parameter are more probable than smaller ones, the prior density should reflect that. Such consideration can be reflected through the choice of prior distribution and/or its parameter values. (Naturally, there are meaningful interactions between combinations of priors and likelihoods/observation distributions to consider, but let's keep things simple for now) In summary, just try out some compatible priors, and use parameters that don't unnecessarily concentrate the density on specific values. |
In this example I made, I used gamma and log-normal priors as parameters, respectively, for Observing the results in the trace plot, the Some of the next steps we need to do:
|
I think you meant to say that you used the gamma and log-normal as priors, and not that you used the priors as parameters. Also, you should clarify that the simulated data was drawn from a 3 component Poisson HMM (i.e. one zero component, and two Poisson's with non-zero rates). This tells us that the data we're fitting should be indicating simple Poissons instead of our model's negative-binomials. Now, when we discuss the "collapsing" to Poissons, the results make a lot more sense.
Since, as you state in the preceding paragraph, the log-normal priors on the
This is an extremely informal statement, but, more importantly, it largely echos the previous statement: i.e. that the posterior
I don't think that implication is best described as an interaction between the prior and likelihood, but, when the
To be more precise, the posterior predictive output looks very much like the output from an equivalent Poisson model (i.e. the true data-generating process). This is a loose confirmation of the possible collapsing/convergence-to-the-true-model mentioned earlier. Although, from this vantage alone, it's difficult to tell exactly how or why these good, Poisson-looking results were achieved. Nevertheless, we have yet another confirmation of relatively good/expected results. After checking the math, the product of this experiment could easily be another test that fits such a model and confirms that the estimated Next, we should try fitting the negative-binomial model against data that's actually generated from a negative-binomial HMM, so we can make sure that the parameters can be correctly estimated by our samplers. Now, in this case, we need to properly consider the identification issues. |
Just made a new gist here. In this new example, I generated data from two Poisson's, and fitted the negative binomial model to it. The In terms of the identification issues, we ideally want the same situation as Poisson's for Negative Binomial: maintaining some sort of ordering between the two states and making sure their probability distributions aren't equal. I attempted by summing the Lastly, the predictive output doesn’t look similar to what we observed in the earlier examples. Going to run the Negative-Binomial simulated data next. |
This is another interesting result: it looks like the first negative-binomial component "converged" to a simple Poisson with a rate parameter somewhere between the rates of the two data-generating Poisson rates, and the second negative-binomial took a slightly larger mean (a little closer to the larger second Poisson component rate) with a large(-ish) dispersion parameter! Due to the third true state (i.e. second, larger Poisson component) being so frequent in the simulated data, the second estimated negative-binomial component (i.e. the estimated third state) appears to be the most "active" and, because of its extra dispersion, it also subsumes some of the estimated second state observations (which are distinctly third state observations in the true model). It would be interesting to see the estimated state sequence vs. the true state sequence, then we could see exactly how often this over-dispersed third state attempts to account for the true seconds state In other words, it appears as though a single highly dispersed negative-binomial component will indeed account for two Poisson components—even when the two Poissons are "well separated" in their rates (e.g. in this case, with a rate parameter difference of 2000). |
Also, when series are long enough, it's difficult to see the results of rarely visited states (e.g. the second state in the posterior output). In these cases, we should split up the graph into stacked subplots so that we can more easily see each observation. Here's a small snippet demonstrating how to split up a Pandas import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.transforms as mtrans
# y_test_splits = np.array_split(y_test, 4)
y_test_splits = [y_split for n, y_split in y_test.groupby(pd.Grouper(freq='W'))]
n_partitions = len(y_test_splits)
fig, axes = plt.subplots(nrows=n_partitions,
sharey=False,
sharex=False,
figsize=(15, 6.0))
major_offset = mtrans.ScaledTranslation(0, -10/72., fig.dpi_scale_trans)
for i, ax in enumerate(axes):
ax.plot(y_test_splits[i],
label=r'$y_t$', color='black',
drawstyle='steps-pre', linewidth=0.5)
ax.xaxis.set_minor_locator(mdates.HourLocator(byhour=range(0, 23, 2)))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%H'))
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=range(0, 7, 1)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %a'))
for label in ax.xaxis.get_majorticklabels():
label.set_transform(label.get_transform() + major_offset)
ax.legend()
plt.tight_layout() If you're not using a Pandas |
New notebook fitting Negative-Binomial model to data generated from Negative-Binomial distribution is here. I copied the model parameters from this earlier example. This time, the simulated data was drawn from a 3-component Negative-Binomial HMM (one zero component, and two Negative-Binomial's with The posterior predictive output looks sort of different from the Poisson case as well, and part of that can be due to how the |
Is this line in the posterior predictive plotting code supposed to use index |
|
These results seem to indicate that both negative-binomials have—once again—"collapsed" into Poissons! That's not what we want in this case. I think the reason it's doing this might have to do with the additive The additive That doesn't necessarily explain the results, but it's an important general consideration. At the very least, it could explain why the sampler had such an apparently hard time. My advice: try removing the |
Good points. Here's the updated example with separated In the posterior predictive plot, the predictions seem more continuous / less discrete than the previous example, which is pattern-wise closer to our data generated from Negative-Binomial distribution. |
Yes, those are better results (relatively speaking). The "true" negative-binomial parameters weren't recovered, but there's still a lot that could explain that. First, can you add a black/dark blue line with the posterior predictive mean/50% HDI? That would help clarify how well the model is performing in terms of expected value. I believe that will tell us something very important about these results. Also, it would help to show the estimated state sequences. Comparison with the "true" sequence might be difficult—given the estimated negative-binomial components—but it could still help us identify other relevant pathologies. (FYI: I just noticed that we've been using the label "...HDI interval", which is a little redundant: i.e. "high-density interval interval") Some Points about PriorsYou went a bit low and strong on the prior parameters for It helps to think of the most extreme cases, and the Dirac delta prior is a good one. If you collapsed the prior onto one value (i.e. expressed some form of prior "certainty" about the parameter), that value is all you'd get back in the posterior—unless, perhaps, the likelihood also collapsed onto a different value. The opposite situation, in which—say—nearly equal probability is given to all values, results in "uninformative" priors. There's a lot more to this situation, but these are the basic ideas. You can also consider a prior distribution as a type of constraint in an optimization problem. As with constraints, priors can guide results toward preferred values and away from other ones. Overall, we (or at least I) have a few a priori assumptions that aren't expressed via these priors, and those could be contributing to these results, but, given that we're simulating the data, it's hard to separate those expectations from our direct knowledge of the "true" model. We can still try, though. To start, we might want to assume that the Taking a somewhat empirical bayesian approach, we can at least say that—given the observed data—the gamma |
Updated the plots with 50% HDI added (same link); fixed those labels too. Looking into prior selection in the meantime. |
The updated notebook won't load for me now. Is there a lot of new output in the notebook? |
Don't think so, but looks like that notebook has some rendering issues… Anyway, here's a new one. I set up the priors for
Should we do anything to the current |
I don't think we need these two priors to be different in any way. They're not exactly bound by the same identifiability/overlap-related concerns as the Just so you know, there's also some subjectivity to this concern we've had over "identifiability". There could be situations in which negative-binomials with nearly equal means (i.e.
Yeah, if we want to obtain the original model in situations with a lot of "overlap", as I stated at the end of my earlier comment, we need to express the types of solutions we want via our priors, and part of that could be accomplished using more sophisticated parameterizations or in an empirical way. For instance, we know that we're estimating two negative-binomial components, and that we want their means to be distinct in some sense. We have an additive parameterization that does this, but it doesn't take into account exactly how distinct the means are, so our estimates end up producing means that are not very distinct relative to their magnitudes. A more sophisticated approach might take into account the variances, and, perhaps, allow us to specify a degree of distributional overlap that we're willing to accept. As an example of the empirical approach, we could partition the range of observed data and set There are plenty more things we can do in both of these categories, and we can even use both approaches simultaneously. Note: before you get too inventive, it always helps to quickly research what others have already done. Often, Google Scholar is a great place to start. For testing purposes, we don't need to be too sophisticated, so just try "informing" those priors manually (e.g. set the |
Sounds good. Just incorporated some of those points in this new notebook. The In the predictive posterior plot, I'm seeing more variance than before in the 50% line. Going to look into some more sophisticated approaches, such as taking variances into account and using point estimates from other models. |
The mean behaviour is much better, but the results are still too concentrated on the third mixture component; however, after seeing the results of these past few changes, I get the impression that very strong priors will be necessary in order to accurately recover the "true" parameters for this simulation. The over-dispersion parameters in the simulation are too large (relative to the means) and the resulting simulation ends up being too "vague" for our priors (and model, really). For instance, you might need to set even more of the priors closer to the true parameter values in order to get the desired posteriors. Otherwise, the level of sophisticated required for a parameterization to account for this vagueness could be too great for what it's worth. This has been a very illustrative exercise, but, for the sake of moving forward, let's just make the simulation more amenable to these priors and focus on confirming that the samplers work at basic level. As a matter of fact, I would consider even more simplifications so that the test runtime isn't unnecessarily long (without sacrificing the test coverage itself, of course). |
In order to shorten the test runtime, should we change the priors around? I've tried using the same priors with different parametrization in the simulated data (i.e. shrinking means and/or dispersion parameters), and so far there hasn't been a noticeable drop in the amount of time taken to generate posterior samples. On a separate/related note, in terms of simplification, should we focus more on the simulated data or the model? |
I was saying that we should simplify the simulation (e.g. reduce the dispersion and/or overlap between mixture components) so that we can keep using simple priors. Again, we just want to make sure that we can estimate multiple negative-binomial mixture components in an HMM, and that starts with the case in which those components are easily identifiable. Also, keep in mind that acceptable unit tests are as small, quick, and simple as possible, so things like posterior predictive sampling aren't an option. We need to base our checks on the posterior samples alone, and small, extreme cases are easier to check. For example, you could simulate an HMM with two negative-binomials and effectively zero overlap between them, then you can more confidently apply simple checks against the true and estimated parameter values. Note: you can easily "guesstimate" values that achieve such overlap, but, if you want to know more about the general idea, look up the Hellinger distance, Bhattacharyya coefficient, etc. |
Makes sense. I reduced the dispersion parameters and increased the relative amount of separation between the Rogering on the small, quick, and simple part re: unit tests. |
That simulation does look better, although the true over-dispersion/"index" parameters (i.e. Try setting one of the Also, it might be good to include |
Okay, I see what you're referring to re: Here's a new notebook. I replaced the dispersion parameters with 0.05 and 0.2; however, the result didn't seem to have gone as well as we thought… Judging the trace plot, the posterior samples for In the sample estimates, I added two lines in green to delineate |
An updated example here. I kept tuning the dispersion parameters so that the In the predictive posterior plot, I used the standard deviation of |
The I recommend adding plots for the prior distributions; those will clarify a lot. In general, if you simulate model parameters that lie in comparatively low probability regions of your priors, then it's not unlikely that the estimation method (MCMC in this case) will focus its efforts almost exclusively in other, relatively acceptable regions. Without specialized parameterizations/constraints or estimation methods to compensate for these low-probability regions of the prior, there's not much more that can be done to remedy the situation—beyond the use of more and/or longer sample chains (in the hope that some samples will be drawn from the "true" low-prior-probability regions, of course). Otherwise, I don't understand those new intervals that were added to the plots. They look like classical CLT-like (i.e. Gaussian-based) confidence intervals, but such approximations aren't necessary here, since we have entire sets of posterior (predictive) samples for each observation, so we can always compute more accurate sample statistics instead. print("Var[Y_t | S_t = 1] = {}".format(np.var(y_test[s_test == 1])))
print("Var[Y_t | S_t = 2] = {}".format(np.var(y_test[s_test == 2]))) |
Here's some example code that plots PyMC3's negative-binomial according to the density/mass function defined in import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
def NegBinom(x, mu, alpha):
return st.gamma.pdf(x, a=alpha, scale=mu/alpha)
x = np.arange(0, 2000)
alphas = 1. / np.r_[0.05, 0.2]
mus = np.r_[10, 1000]
fig, axes = plt.subplots(2, sharex=True)
for a, m, ax in zip(alphas, mus, axes):
pmf = NegBinom(x, m, a)
ax.plot(x, pmf, '-', label=r'$\alpha$ = {}, $\mu$ = {}'.format(a, m))
ax.set_ylabel('nbinom.pmf(x)', fontsize=10)
ax.legend()
plt.xlabel('x', fontsize=12)
plt.tight_layout()
plt.show() Notice that the |
Note: PyMC3 uses half-Cauchys as default priors for |
Thanks for the notes! Here's the updated example with the reciprocals of the values we were considering. I've added those prior plots, using the above code snippet, after the cell where we print out In the posterior samples, I'm seeing In the posterior predictive plot, there's an increased degree of sparsity overall. Could it be possibly due to the reduction in dispersion, as we initialized the priors? |
Yeah, it does look a little better; however, I only see plots for the component/emissions distributions—which are in the observation/likelihood-defining part of the model—and not the priors, so I can't tell whether or not those posterior discrepancies are due to the influence of unnecessarily informative priors. It still seems quite likely that they are. You can use Some minor notes:
|
Thanks for the notes, again. I've addressed these points in here. This time, we're using the regular definition of the dispersion parameters, aka The result seems to make sense in the posterior samples (trace plot), except that |
All right, I think we've answer most of our negative-binomial related questions, since we've show that they can be estimated sufficiently well, so I'm going to close this issue for now. |
Let's see how well our HMM state-sequence and transitions samplers work under negative-binomial emissions/response when its distribution parameters are estimated using NUTS.
We should also fit the aforementioned model to data generated by an HMM with two non-zero Poisson components having rates that are relatively overlapping, and vice versa (i.e. fit data produced by a negative-binomial using a two-component Poisson mixture).
In this situation, we can—for instance—see how well the extra dispersion of the negative-binomial can account for data that might otherwise require another Poisson component.
Note: These considerations are also related to #11, since the resulting posterior of a Poisson-gamma is negative-binomial.
The text was updated successfully, but these errors were encountered: