-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GiR issue with new version of IPTM #1
Comments
Could this be caused by label-switching in the backwards sampling? In the forward sampling, we are generating topics from topic distribution k for documents in cluster k, and there is no ambiguity regarding which cluster is topic k in the forward sampling, right? However, in inference, and thus backwards sampling, it seems like label-switching may represent an additional form of variation. Is it possible to artificially introduce label switching into the forward sampling to see if this could produce additional variance in these GiR measures? |
Perhaps try calculating the statistics based on the topic-type counts (e.g., N_kv) not the assignments (e.g., z_i). The statistics (e.g., variance) of the counts N_kv should be immune from label-switching. |
topic-type counts (N_kv) looks fine in terms of passing GiR. However, current inference (and thus backward samples) converges to one interaction pattern and one (or two) topics across the entire corpus in the long run. For example, if the topic distribution for K=4 is (0.25, 0.25, 0.25, 0.25) in forward sampling, the inferred topic distribution in backward is (0.8, 0.2, 0, 0). I think this is still problematic if it ends up with very few topics remaining in the real data analysis. I don't quite understand how backward sampling introduces label-switching issue "when we start the inference from the initial values set as true topic distribution". |
It doesn't sound like label switching is the issue. Bomin, can you point me
to the generative process and the inference equations?
…On Feb 19, 2018 5:56 PM, "Bomin Kim" ***@***.***> wrote:
topic-type counts (N_kv) looks fine in terms of passing GiR. However,
current inference (and thus backward samples) converges to one interaction
pattern and one (or two) topics across the entire corpus in the long run.
For example, if the topic distribution for K=4 is (0.25, 0.25, 0.25, 0.25)
in forward sampling, the inferred topic distribution in backward is (0.8,
0.2, 0, 0). I think this is still problematic if it ends up with very few
topics remaining in the real data analysis.
I don't quite understand how backward sampling introduces label-switching
issue "when we start the inference from the initial values set as true
topic distribution".
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7jUDuOdx0Xg3lWGo9tfQcxZKdQW-ks5tWbXmgaJpZM4SK117>
.
|
You can look at paper/icml2018_style/IPTM_ICML2.pdf. |
I will have to look tomorrow as I only have phone access today so I can't
pull from the repo. Basically I'm wondering if you are doing the same
integrations/approximations in the generative process as in inference. I'd
try that.
…On Feb 19, 2018 6:42 PM, "Bomin Kim" ***@***.***> wrote:
You can look at paper/icml2018_style/IPTM_ICML2.pdf.
Generating process is in Section 2.2, and the inference equation is
Equation (15) in page 4.
Although the draft is currently written for minimal path assumption, I am
currently failing for both minimal and maximum path assumptions. (Since GiR
use same number of words across all documents, maximal should be fine and
it should pass)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7tdw6_7CkcqR1yK2YLDYOumNq6wDks5tWcCggaJpZM4SK117>
.
|
Generating from collapsed LDA equations definitely helped, but there exists remaining issue.
Another related question is, isn't the overall corpus-wide topic distribution (m in #2 above) controlled by the distribution of interaction pattern assignments (or clusters) across the documents? If that is the case, it may not be necessary to use three hierarchy instead of two---in other words, m will be the weighted average of m_c across c=1,...,C so assuming uniform base for m_c would be fine...? |
I don't understand your related question.
…On Feb 23, 2018 12:02 PM, "Bomin Kim" ***@***.***> wrote:
Generating from collapsed LDA equations definitely helped, but there
exists remaining issue.
1.
Now having two hierarchy with uniform base for interaction
pattern-specific topic distribution (i.e., m_c ~ Dir(\alpha1, u) and
\theta_d ~ Dir(\alpha, m_{c_d})) passes GiR for both maximal and minimal
path assumption---which did not pass with non-collapsed generating process.
2.
However, when directly follow cluster LDA and assign three hierarchy
with additional layer representing corpus-wide topic distribution (i.e., m
~ Dir(\alpha0, u), m_c ~ Dir(\alpha1, m) and \theta_d ~ Dir(\alpha,
m_{c_d})), it still fails GiR---Backward samplers concentrated on few
dominant topics. I used Equation (15) in /paper/icml2018_style/IPTM_ICML2.pdf
for both generative process and inference, and nothing (both equation and
code) seems to be wrong. Maybe I should not directly generate from the
entirely-collapsed equation when I have this additional hierarchy?
Another related question is, isn't the overall corpus-wide topic
distribution (m in #2 above) controlled by the distribution of interaction
pattern assignments (or clusters) across the documents? If that is the
case, it may not be necessary to use #2 instead of #1
<#1>---in other words, m will
be the weighted average of m_c across c=1,...,C so assuming uniform base
for m_c would be fine...?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7mWc-uYDjPQrPZA5UJV3psZWQ5Liks5tXu8sgaJpZM4SK117>
.
|
Can you attach the PDF? I'm on my phone today and can't pull from the repo.
…On Feb 23, 2018 12:02 PM, "Bomin Kim" ***@***.***> wrote:
Generating from collapsed LDA equations definitely helped, but there
exists remaining issue.
1.
Now having two hierarchy with uniform base for interaction
pattern-specific topic distribution (i.e., m_c ~ Dir(\alpha1, u) and
\theta_d ~ Dir(\alpha, m_{c_d})) passes GiR for both maximal and minimal
path assumption---which did not pass with non-collapsed generating process.
2.
However, when directly follow cluster LDA and assign three hierarchy
with additional layer representing corpus-wide topic distribution (i.e., m
~ Dir(\alpha0, u), m_c ~ Dir(\alpha1, m) and \theta_d ~ Dir(\alpha,
m_{c_d})), it still fails GiR---Backward samplers concentrated on few
dominant topics. I used Equation (15) in /paper/icml2018_style/IPTM_ICML2.pdf
for both generative process and inference, and nothing (both equation and
code) seems to be wrong. Maybe I should not directly generate from the
entirely-collapsed equation when I have this additional hierarchy?
Another related question is, isn't the overall corpus-wide topic
distribution (m in #2 above) controlled by the distribution of interaction
pattern assignments (or clusters) across the documents? If that is the
case, it may not be necessary to use #2 instead of #1
<#1>---in other words, m will
be the weighted average of m_c across c=1,...,C so assuming uniform base
for m_c would be fine...?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7mWc-uYDjPQrPZA5UJV3psZWQ5Liks5tXu8sgaJpZM4SK117>
.
|
Lastly, to me this sounds like there's a subtle bug somewhere....
…On Feb 23, 2018 12:02 PM, "Bomin Kim" ***@***.***> wrote:
Generating from collapsed LDA equations definitely helped, but there
exists remaining issue.
1.
Now having two hierarchy with uniform base for interaction
pattern-specific topic distribution (i.e., m_c ~ Dir(\alpha1, u) and
\theta_d ~ Dir(\alpha, m_{c_d})) passes GiR for both maximal and minimal
path assumption---which did not pass with non-collapsed generating process.
2.
However, when directly follow cluster LDA and assign three hierarchy
with additional layer representing corpus-wide topic distribution (i.e., m
~ Dir(\alpha0, u), m_c ~ Dir(\alpha1, m) and \theta_d ~ Dir(\alpha,
m_{c_d})), it still fails GiR---Backward samplers concentrated on few
dominant topics. I used Equation (15) in /paper/icml2018_style/IPTM_ICML2.pdf
for both generative process and inference, and nothing (both equation and
code) seems to be wrong. Maybe I should not directly generate from the
entirely-collapsed equation when I have this additional hierarchy?
Another related question is, isn't the overall corpus-wide topic
distribution (m in #2 above) controlled by the distribution of interaction
pattern assignments (or clusters) across the documents? If that is the
case, it may not be necessary to use #2 instead of #1
<#1>---in other words, m will
be the weighted average of m_c across c=1,...,C so assuming uniform base
for m_c would be fine...?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7mWc-uYDjPQrPZA5UJV3psZWQ5Liks5tXu8sgaJpZM4SK117>
.
|
Just to be clear, here is how I generate z's.
|
Just for maximal, right? This incrementing procedure isn't right for
minimal.
…On Feb 23, 2018 12:11 PM, "Bomin Kim" ***@***.***> wrote:
IPTM_ICML2.pdf
<https://github.com/desmarais-lab/IPTM/files/1752470/IPTM_ICML2.pdf>
Just to be clear, here is how I generate z's.
- initialize N_dk = 0, N_kc = 0, and N_k = 0
for (d in 1:D){
for (n in 1:N_d) {
z_dn ~ Equation (15)
N_{d z_dn} +=1
N_{z_dn c_d} +=1
N_{z_dn} +=1
}
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7meavz5wRCjs0st4CzGjsNXlVnnMks5tXvFMgaJpZM4SK117>
.
|
Yes. Just for the maximal! |
And even maximal doesn't pass?
…On Feb 23, 2018 12:14 PM, "Bomin Kim" ***@***.***> wrote:
Yes. Just for the maximal!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7vZf8QhXm5c6pL2R3uv1cW-465M6ks5tXvIGgaJpZM4SK117>
.
|
Yes, both maximal and minimal fail. for (iter in 1:Niter) { and compare N_k from forward and backward using GiR plots which looks like below. |
I just don't get how this can work with the two level hierarchy but not 3.
I think there must be a bug. When you do the two level hierarchy ate you
using different code to when you're using the 3 level hierarchy?
On Feb 23, 2018 2:00 PM, "Bomin Kim" <[email protected]> wrote:
Yes, both maximal and minimal fail.
After generating z's as above, I infer them as below:
for (iter in 1:Niter) {
for (d in 1:D) {
for (n in 1:N_d) {
N_{d z_dn} -=1
N_{z_dn c_d} -=1
N_{z_dn} -=1
z_dn ~ Equation (15) #new topic assignment
N_{d z_dn} +=1
N_{z_dn c_d} +=1
N_{z_dn} +=1
}
}
}
and compare N_k from forward and backward using GiR plots which looks like
below.
(document-IP and token-word distribution plots become completely "pass"
when I shut down inference for Z's).
GiRplot.pdf
<https://github.com/desmarais-lab/IPTM/files/1752802/GiRplot.pdf>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7l9_-HiIHxTaOK7nsLsHYB0job1Cks5tXwq1gaJpZM4SK117>
.
|
One idea: set alpha0 in your 3 level code so big that the model effectively
bypasses the corpus-level counts. Does it pass?
…On Feb 23, 2018 2:03 PM, "Hanna Wallach" ***@***.***> wrote:
I just don't get how this can work with the two level hierarchy but not 3.
I think there must be a bug. When you do the two level hierarchy ate you
using different code to when you're using the 3 level hierarchy?
On Feb 23, 2018 2:00 PM, "Bomin Kim" ***@***.***> wrote:
Yes, both maximal and minimal fail.
After generating z's as above, I infer them as below:
for (iter in 1:Niter) {
for (d in 1:D) {
for (n in 1:N_d) {
N_{d z_dn} -=1
N_{z_dn c_d} -=1
N_{z_dn} -=1
z_dn ~ Equation (15) #new topic assignment
N_{d z_dn} +=1
N_{z_dn c_d} +=1
N_{z_dn} +=1
}
}
}
and compare N_k from forward and backward using GiR plots which looks like
below.
(document-IP and token-word distribution plots become completely "pass"
when I shut down inference for Z's).
GiRplot.pdf
<https://github.com/desmarais-lab/IPTM/files/1752802/GiRplot.pdf>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7l9_-HiIHxTaOK7nsLsHYB0job1Cks5tXwq1gaJpZM4SK117>
.
|
Yes, I use the same code for the level 2 and level 3 hierarchy. I just tried to set alpha0 bigger and apparently it gets closer to "pass". |
Okay. Are you sampling the alphas? It sounds like you are not?
…On Feb 23, 2018 2:17 PM, "Bomin Kim" ***@***.***> wrote:
Yes, I use the same code for the level 2 and level 3 hierarchy.
Only difference is that I replace the fraction (N_k + alpha0/K)/(N+alpha0)
by 1/K.
I just tried to set alpha0 bigger and apparently it gets closer to "pass".
With alpha0 = 1000, it passed.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7m966RdCMXppbda1bnV3f-0bW0Fkks5tXw68gaJpZM4SK117>
.
|
No. All alpha's are treated as hyperparameters. |
Here's what it sounds like to me: the counts at the top level are very big.
(Print them to get a sense of the magnitude.) If you have small alphas at
that level, you're putting a massive amount of weight on the top-level
counts and so I imagine you're getting stuck in a shitty rich get richer
scenario and not getting out of it.
On Feb 23, 2018 2:17 PM, "Hanna Wallach" <[email protected]> wrote:
Okay. Are you sampling the alphas? It sounds like you are not?
…On Feb 23, 2018 2:17 PM, "Bomin Kim" ***@***.***> wrote:
Yes, I use the same code for the level 2 and level 3 hierarchy.
Only difference is that I replace the fraction (N_k + alpha0/K)/(N+alpha0)
by 1/K.
I just tried to set alpha0 bigger and apparently it gets closer to "pass".
With alpha0 = 1000, it passed.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7m966RdCMXppbda1bnV3f-0bW0Fkks5tXw68gaJpZM4SK117>
.
|
Probably. (What values are you setting them to?)
…On Feb 23, 2018 2:19 PM, "Bomin Kim" ***@***.***> wrote:
No. All alpha's are treated as hyperparameters.
Should we embed sampling steps for alphas then?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7os-74d3LQ9LIjoCjTD225r2Isrwks5tXw9CgaJpZM4SK117>
.
|
Totally agree with you on "getting stuck in a shitty rich get richer scenario". I so far varied alphas from 5 to 50 (I thought this should pass no matter what alphas are) with different combinations, but now I realize that was not big enough. |
I'd sample the alphas.
…On Feb 23, 2018 2:29 PM, "Bomin Kim" ***@***.***> wrote:
Totally agree with you on "getting stuck in a shitty rich get richer
scenario".
The reason 2 level worked fine is it always lowers richer probability and
increases poorer ones.
Similarly when we use huge alpha0, it gets closer to 2 level so it passed.
I so far varied alphas from 5 to 50 (I thought this should pass no matter
what alphas are) with different combinations, but now I realize that was
not big enough.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7qFTtd0Mt1brqCfQ0j_dBVQ2h4Oqks5tXxF9gaJpZM4SK117>
.
|
I'm also still not convinced that there isn't a bug somewhere.
…On Feb 23, 2018 2:29 PM, "Bomin Kim" ***@***.***> wrote:
Totally agree with you on "getting stuck in a shitty rich get richer
scenario".
The reason 2 level worked fine is it always lowers richer probability and
increases poorer ones.
Similarly when we use huge alpha0, it gets closer to 2 level so it passed.
I so far varied alphas from 5 to 50 (I thought this should pass no matter
what alphas are) with different combinations, but now I realize that was
not big enough.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7qFTtd0Mt1brqCfQ0j_dBVQ2h4Oqks5tXxF9gaJpZM4SK117>
.
|
I seems like alpha1 (contribution of the counts at the middle level) also needed to be bigger. |
(I started this before your most recent email.)
Replying quickly from my computer.
1. I'm not convinced that there's not a bug.
2. You can think of each alpha as a pseudocount. If the count that you're
adding an alpha to is N_d -- i.e., a document length -- then this tells you
something about the value of the alpha that you want. In other words, you
likely want something that is not substantially larger or substantially
smaller than N_d. Ditto if the count that you're adding an alpha to is N_c
-- here, you want something that's roughly comparable to the number of
tokens associated with a cluster. And ditto for N_. at the top level. Since
N_. is the total number of tokens in the corpus, it will need to be
muuuuuuuch larger than the kind of alpha values that are suitable for the
N_d level (which is the level we're usually working at with LDA).
…On Fri, Feb 23, 2018 at 2:30 PM, Hanna Wallach ***@***.***> wrote:
I'm also still not convinced that there isn't a bug somewhere.
On Feb 23, 2018 2:29 PM, "Bomin Kim" ***@***.***> wrote:
> Totally agree with you on "getting stuck in a shitty rich get richer
> scenario".
> The reason 2 level worked fine is it always lowers richer probability and
> increases poorer ones.
> Similarly when we use huge alpha0, it gets closer to 2 level so it passed.
>
> I so far varied alphas from 5 to 50 (I thought this should pass no matter
> what alphas are) with different combinations, but now I realize that was
> not big enough.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AA1T7qFTtd0Mt1brqCfQ0j_dBVQ2h4Oqks5tXxF9gaJpZM4SK117>
> .
>
--
hanna wallach
http://dirichlet.net/
|
3. I'd definitely sample the alphas. It's even less easy to figure out good
alpha values for the minimal path assumption.
…On Fri, Feb 23, 2018 at 3:47 PM, Hanna Wallach ***@***.***> wrote:
(I started this before your most recent email.)
Replying quickly from my computer.
1. I'm not convinced that there's not a bug.
2. You can think of each alpha as a pseudocount. If the count that you're
adding an alpha to is N_d -- i.e., a document length -- then this tells you
something about the value of the alpha that you want. In other words, you
likely want something that is not substantially larger or substantially
smaller than N_d. Ditto if the count that you're adding an alpha to is N_c
-- here, you want something that's roughly comparable to the number of
tokens associated with a cluster. And ditto for N_. at the top level. Since
N_. is the total number of tokens in the corpus, it will need to be
muuuuuuuch larger than the kind of alpha values that are suitable for the
N_d level (which is the level we're usually working at with LDA).
On Fri, Feb 23, 2018 at 2:30 PM, Hanna Wallach ***@***.***>
wrote:
> I'm also still not convinced that there isn't a bug somewhere.
>
> On Feb 23, 2018 2:29 PM, "Bomin Kim" ***@***.***> wrote:
>
>> Totally agree with you on "getting stuck in a shitty rich get richer
>> scenario".
>> The reason 2 level worked fine is it always lowers richer probability
>> and increases poorer ones.
>> Similarly when we use huge alpha0, it gets closer to 2 level so it
>> passed.
>>
>> I so far varied alphas from 5 to 50 (I thought this should pass no
>> matter what alphas are) with different combinations, but now I realize that
>> was not big enough.
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub
>> <#1 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AA1T7qFTtd0Mt1brqCfQ0j_dBVQ2h4Oqks5tXxF9gaJpZM4SK117>
>> .
>>
>
--
hanna wallach
http://dirichlet.net/
--
hanna wallach
http://dirichlet.net/
|
4. If there isn't a bug then the issue is mixing, caused by having shitty
alpha values that mean it's taking waaaay too long to mix.
…On Fri, Feb 23, 2018 at 3:47 PM, Hanna Wallach ***@***.***> wrote:
3. I'd definitely sample the alphas. It's even less easy to figure out
good alpha values for the minimal path assumption.
On Fri, Feb 23, 2018 at 3:47 PM, Hanna Wallach ***@***.***>
wrote:
> (I started this before your most recent email.)
>
> Replying quickly from my computer.
>
> 1. I'm not convinced that there's not a bug.
>
> 2. You can think of each alpha as a pseudocount. If the count that you're
> adding an alpha to is N_d -- i.e., a document length -- then this tells you
> something about the value of the alpha that you want. In other words, you
> likely want something that is not substantially larger or substantially
> smaller than N_d. Ditto if the count that you're adding an alpha to is N_c
> -- here, you want something that's roughly comparable to the number of
> tokens associated with a cluster. And ditto for N_. at the top level. Since
> N_. is the total number of tokens in the corpus, it will need to be
> muuuuuuuch larger than the kind of alpha values that are suitable for the
> N_d level (which is the level we're usually working at with LDA).
>
> On Fri, Feb 23, 2018 at 2:30 PM, Hanna Wallach ***@***.***>
> wrote:
>
>> I'm also still not convinced that there isn't a bug somewhere.
>>
>> On Feb 23, 2018 2:29 PM, "Bomin Kim" ***@***.***> wrote:
>>
>>> Totally agree with you on "getting stuck in a shitty rich get richer
>>> scenario".
>>> The reason 2 level worked fine is it always lowers richer probability
>>> and increases poorer ones.
>>> Similarly when we use huge alpha0, it gets closer to 2 level so it
>>> passed.
>>>
>>> I so far varied alphas from 5 to 50 (I thought this should pass no
>>> matter what alphas are) with different combinations, but now I realize that
>>> was not big enough.
>>>
>>> —
>>> You are receiving this because you commented.
>>> Reply to this email directly, view it on GitHub
>>> <#1 (comment)>,
>>> or mute the thread
>>> <https://github.com/notifications/unsubscribe-auth/AA1T7qFTtd0Mt1brqCfQ0j_dBVQ2h4Oqks5tXxF9gaJpZM4SK117>
>>> .
>>>
>>
>
>
> --
> hanna wallach
> http://dirichlet.net/
>
--
hanna wallach
http://dirichlet.net/
--
hanna wallach
http://dirichlet.net/
|
Thanks for suggestions! #2 definitely explains why it worked out with (alpha, alpha1, alpha0) = (5, 50, 100). I will check out the bugs first and (no matter there is a bug or not) work on sampling of the alpha since we are anyway going to use the minimal path assumption. |
Have you implemented Schein testing? This test will fail if there is a software bug but *not* if a correctly implemented sampler is failing to mix. It also takes many fewer samples to detect a bug.
…Sent from my iPhone
On Feb 23, 2018, at 3:58 PM, Bomin Kim ***@***.***> wrote:
Thanks for suggestions! #2 definitely explains why it worked out with (alpha, alpha1, alpha0) = (5, 50, 100). I will check out the bugs first and (no matter there is a bug or not) work on sampling of the alpha since we are anyway going to use the minimal path assumption.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
+1
…On Feb 25, 2018 3:15 PM, "Aaron Schein" ***@***.***> wrote:
Have you implemented Schein testing? This test will fail if there is a
software bug but *not* if a correctly implemented sampler is failing to
mix. It also takes many fewer samples to detect a bug.
Sent from my iPhone
> On Feb 23, 2018, at 3:58 PM, Bomin Kim ***@***.***> wrote:
>
> Thanks for suggestions! #2 definitely explains why it worked out with
(alpha, alpha1, alpha0) = (5, 50, 100). I will check out the bugs first and
(no matter there is a bug or not) work on sampling of the alpha since we
are anyway going to use the minimal path assumption.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7qBqNA6FUAGI2iZh0W7_XS0a49iSks5tYb90gaJpZM4SK117>
.
|
Yes, what I have been working on so far is actually Schein testing (to prevent from mixing issue). Schein test passes with small number of outer iterations (thus only few steps away from true values), but it fails as I increase the size of outer iterations. |
Okay. Sounds like there is a bug somewhere. When you clamp various parts of
the model, which bita pass/fail?
…On Feb 25, 2018 3:31 PM, "Bomin Kim" ***@***.***> wrote:
Yes, what I have been working on so far is actually Schein testing (to
prevent from mixing issue). Schein test passes with small number of outer
iterations (thus only few steps away from true values), but it fails as I
increase the size of outer iterations.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA1T7lThwxOBEnqH6QsOwcBsm4WXkWz0ks5tYcM9gaJpZM4SK117>
.
|
It was only topic distribution {N_k}_{k=1}^K that failed Schein test when I clamped the rest of parts of the IPTM. |
Sorry for being late, but I attached the derivation of cluster LDA. |
I have spent very long time figuring out the possible bugs, but failed.
I was able to re-derive exactly the same sampling equation as cluster LDA, so I do not see any mathematical error.
IP assignments and topic assignments from backward sampling always has larger variance than those from forward sampling, and the difference gets larger as we run more outer iterations.
To test this in the simplest setting of cluster LDA (no other variables other than cd and z), I ran 'clusterLDA.R' code in GiR2 folder and I get the same odd results. Is there anything I am totally missing?
The text was updated successfully, but these errors were encountered: