-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove non-API calls in C #6180
Comments
there is a web page which lists all of the API calls, grouped into 3 categories |
You should probably be proactive and contact R-core regarding your need of an API to |
if this is important for you (or anyone else reading) please take the time to construct a performance comparison. |
I would but unfortunately I think you need at least basic knowledge of C to do that. |
AFAIU we can replace |
Where is |
We use it as part of memrecycle and subsetting so its basically in most operations :D But we are not alone with this problem r-lib/vctrs#1933 |
Sure, but I'm wondering what the tests will be focused on (since this affects multiple functions) and then what about |
I don't see By the same definition, touching S4 objects from C is also non-API! 🤔 |
I see, a = 1:10
address(a)
# [1] "0x5651b8564b30"
length(a) <- 20
address(a)
# [1] "0x5651b8ab5c68" |
It looks like more of our usage has been marked as non-API in the meantime:
new:
Lines 30 to 32 in fab93a9
Line 20 in fab93a9
Lines 58 to 63 in fab93a9
Lines 548 to 552 in fab93a9
Lines 10 to 12 in fab93a9
Line 535 in fab93a9
|
there is a new section in WRE https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Moving-into-C-API-compliance |
@tdhock there is nothing about |
I can imagine group by, when not gforce optimized, is setting true length as well. It allocates memory for a biggest group and then reuses that space for smaller groups as well. |
This issue also makes our CI fail since we get now a new 3rd NOTE: |
Yes, for the moment till we address this issue. CI is used to spot new issues, this one is already known. |
Afair growable vectors are part of the public api, and those calls which apparently should be removed now are essential tools to work with growable vectors. That may be some argument to request official support for those. |
Would be worth clarifying on r-devel then. Currently, And there's no mention in WRE of "growable" |
Uh then it is possibly not a public api then. I had that impression when recalling how well it was announced when got implemented. |
Have you guys been in touch with R-core about the growable vectors issue? Or do you know of anyone who has been in touch with them regarding this? Seems to me |
I don't think you mean growable vectors. AFAIU all three functions you mentioned were implemented in DT before growable vector happened in R. |
Ok, yes I mean the manipulation of the visible and true vector length with |
Over-allocated vectors (including lists since R-4.3.0) can be implemented using ALTREP in something like 60-100 lines of code (separate implementation for each primitive type, unfortunately). I expect it to be hard but not impossible to (1) refactor Use of |
It's a nice idea, although it would significantly reduce backwards compatibility by one major version (data.table currently depends R >= 3.3.0) and I'm not sure what consequences ALTREP would have for the interopertability of data.table, particularly for package C code meant to manipulate standard data frames (and for storage as rds and reading if data.table is not loaded)? In any case, if maintainers are unsure how to proceed, perhaps invite R-core to the conversation (e.g. Thomas Kalibera (kalibera) or Luke Tierney (ltierney)). Seems also the vctrs team has not found a solution yet (r-lib/vctrs#1933) so it seems this is not a small issue (and we're talking about probably a third or quarter of CRAN packages that depends on data.table and vctrs, so it must be of concern to R-core that a good solution is found here) |
Part of the idea is to keep both implementations and use the original one for R ∈ [3.3.0; 4.3.0) and the ALTREP one for R ≥ 4.3.0. Maintenance burden, yes, but the old code is already quite stable, and the internals of old R versions aren't going to change. The C code, even if it requests raw data pointers, shouldn't notice anything amiss: since the ALTREP class already carries a full-blown over-allocated object of the same type inside it, it can return its (writeable) data pointer when requested. Serialization will have similar properties to what we have right now: if an ALTREP class does not implement a An ALTREP data.table could also take care of duplicating itself properly, obviating a whole class of bugs resulting from it being shallowly duplicated by |
Sounds Good. It would be ideal of course if other frameworks like collapse don't need to take additional actions to interoperate with data.table. I will stay updated with this (SebKrantz/collapse#653). Happy to assist with the implementation as well, although I haven't used ALTREP yet. |
We are getting these NOTEs on CRAN, e.g.
Found non-API calls to R: ‘SETLENGTH’, ‘SET_S4_OBJECT’,
‘SET_TRUELENGTH’, ‘UNSET_S4_OBJECT’
I'm really skeptical about the SETLENGTH and SET_TRUELENGTH parts, which have long been a key underlying feature of data.table internals. I propose we try and keep using those unless CRAN really forces our hands. It be really great if someone wanted to do a performance comparison of the package with/without those calls.
Anyway, it should be easier to remove the non-API S4 calls.
The list does keep changing as this is an actively evolving piece of r-devel. Here's the current list of things we should (eventually) address:
The text was updated successfully, but these errors were encountered: