-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug in DOSE:::gseaScores - Error in if (abs(max.ES) > abs(min.ES)) #46
Comments
@therealgenna Thank you for your suggestion. Could please provide repeatable code and data so that we can modify this problem more conveniently. |
@huerqiang I'll try to send you a code with an example, but I've already pointed to the problem, which is easy to see.
I have a case that all I think maybe a reasonable fix would be to do the following - add the indicated line:
I can create an example simply by picking one of the gene sets and setting scores for those genes to 0 (so that |
Hi @huerqiang , here is the example and the data is attached in
I've noticed that I also apparently need
Thank you. |
@therealgenna I looked at your data and found that there are 23119 genes in your geneList, of which 17456 have a value of 0. If you got these values through foldchange, it is likely that your expression profile has not been preprocessed, and the expression values of many genes in a certain group are all 0.. |
@huerqiang Yes, I have many zeros, but this is my scoring, which is based on multiple factors, including fold changes and p.adjust values in multiple comparisons. I'd assume the function would/should still work as I don't have to have ranks based on a single set of fold changes alone. Am I wrong? |
You should persuade us that your data is truly valid. Otherwise the only reason I can think of is that more than half of your data are missing and GSEA is not designed for it. |
@GuangchuangYu As I wrote above, I derived my scoring based on multiple pairwise comparisons, each one having fold changes and (adjusted) p-values. Either when p.adjust is 1 or reaches some high threshold (which I can choose) in multiple comparisons for a given gene, its score will be zero - this is how I decided to do that and there is nothing wrong about it and I will know how to interpret the result. I have opened this issue simply because your function does not behave gracefully - it just throws an error, fails and no result is produced. This should not happen in a well-polished code. That's my point basically. I am just saying that if it comes to the situation when |
Returning NAN instead of throw error is not always 'well-polished'. In case the NAN was generated due to a wrong input, it would be better to throw error. How do we rank a gene list for half of them are zero? pls answer my question directly instead of teach me how to polish my code. If you can't persuade us, we can't help. |
@GuangchuangYu In case of the NAN was generated it would be better to give some informative warning (and proceed with other gene sets.) If your question is "How do we rank a gene list for half of them are zero?" - it's the user who provides you with the ranked list, so it's user's responsibility. I still might have thousands of genes with positive and negative score and the idea of GSEA to try and pick weak signal is still valid. One can replace zeros by tiny randomly generated values around zero; for example I can modify my scoring to give some values like that (e.g., not return 0 when padj=1; either do it randomly or deterministically) but really there would be no good way to say that some of these "zero" genes are higher ranked than the others. If it is important for the program to know the "true" order of "zero" genes and the results would be substantially different if I permute this "zero" subset, then, first, it is not clear to me that this is necessarily the case [is it?], and if it is then it might make sense to do some permutation reorderings of this subset and get some average. I am not sure why you are so reluctant to make some fixes to avoid the hard crash. We are talking about two separate issues here: (i) can I use a gene list with many genes having score zero, and (ii) code crashing. My focus is on (ii). You are worried about (i) but as I said I think the idea of GSEA is still valid. You seem to say no and I don't know why. But even if you stick to your opinion, for example because the actual order of the "zero" genes affects results a lot for many/all gene sets, at least giving an indication to the user of why the function fails would make sense (still, making it not fail would be preferable.) |
Hi,
I am using R v4 and the latest clusterProfiler and DOSE package (DOSE version 3.16.0) and I get the following error:
As far as I can tell this comes from line 23 of
DOSE:::gseaScores
. The error happens when scores of all the genes in the considered geneSet are 0. In this case NR=0 and division by NR inPhit <- cumsum(Phit/NR)
leads to allPhit
values set to NaN. This eventually leads to the above-mentioned error.It might be unlikely to expect zero scores for some genes, but I do get them, depending on how I get these scores. One dirty way to fix it would be to simulate tiny random deviations from zero for zero scores, but I wonder if there is a cleaner way to do it.
Thank you
The text was updated successfully, but these errors were encountered: