-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiVals for select? #2
Comments
The object in your case is an EnsDb, for which there already is a multiVals argument, and for which 'first' is the default.
Johannes Rainier has a significantly different codebase in ensembldb than what is in AnnotationDbi, and doesn't seem to do the LEFT JOINs in AnnotationDbi that will end up blowing out the rows of the returned data.frame. For packages that dispatch on select from AnnotationDbi, I find it safer to use mapIds repeatedly instead anno <- as.data.frame(c("SYMBOL", "GENENAME","WHATEVER"), function(x) mapIds(OrgDb, KEYS, x, COLUMN)) Which is one line, respects the order of the incoming KEYS, and doesn't suffer from row blowout. |
I assume I'm missing something here because there isn't a
Yeah, well, fine, but you and I are super-pros. Also I guess you're missing a I didn't mention this in my original post, but the code snippets above are used throughout the https://osca.bioconductor.org/ book, and it's a lot easier to teach people if there's a function that just does the job instead of expecting them to put together a one-liner like the above.
|
It would help to have reproducible example (
and don't (yet) understand the problem? Is this an issue with ensembldb, which from |
A better example of what I believe Aaron is getting at (although not a good use-case for what he wants, admittedly) is
An
Where multiple 1:many mappings absolutely blow out the number of rows due to two LEFT JOINs between tables. And if you just want one row per gene (and naively want to use just the first one, because what other choice is materially better), you either have to remove duplicates in the first column as Aaron does in his example, or use Not sure, but it might be relatively simple to do pretty much what Aaron suggests, adding in a |
Thanks @jmacdon, the |
This is something I have had in my affycoretools package for a while, intended to do what you want in the context of an
Since this function is intended to put an |
@LTLA Given that there are multiple versions of It seems to be a specific use case for a specific object, and is thus probably better housed with the code that generates the object. Obvious downside being that |
scater already hosts a function for doing this annotation from |
@LTLA and @jmacdon - Martin and I have been working on this issue and think we finally have something that will do what you are looking for. If you want to take a look at the
Feel free to test it out and let us know your thoughts. Thanks! |
From a quick glance, this is perfect. I would further suggest that:
|
Looks good, although should it always return a |
So does |
The argument is |
Two other things.
|
Thank you for your reply @jmacdon! Sorry, what threw me off is that there's no warning that a meaningless argument ( Also, are you sure it works as expected? From the documentation, it seems that
returns two values - ST7 and ST7-OT3. Thanks again for your help! |
As far as silently accepting an argument, again you would know that's the case by reading the help page and seeing the function arguments.
Any function with an ellipsis will allow you to provide arbitrary arguments that can then be passed down to underlying functions. This is a good thing! The downside is that you have to read the help page and use the correct arguments (like column, rather than columns as you have done).
|
Oh, wait. @apredeus Are you talking about the branch that Kayla made? That's a branch, not something that's in the release or devel version. If you want that you need to use git to clone the repo, then check out the branch, and then install that. |
Sorry, yes, I was confused by the commands Kayla listed above. I didn't realize it was a branch she developed. I was also thrown off by running I guess bioconductor annotation packages are just not my thing :) Thanks for being patient and answering my questions though - much appreciated. |
In my analysis code, I have not-uncommon occurrences of:
It would be nice to do something like:
... and save myself an extra line of code (and improve robustness to changes to the annotation object). Sort of like how I get an integer vector if I ask for
findOverlaps(..., select="first")
.The text was updated successfully, but these errors were encountered: