-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Singleton" APIs are currently wasteful over FFI #6050
Comments
None of the approaches I can think of here are great. Firstly, here's an overview of the primary APIs where the ICU4C version uses free functions:
There are more APIs we could in theory also remove the allocation cost from, however these are more expensive anyway so it might not be worth it. We should think about these during this discussion, but they're not my focus and I don't particularly care if we don't solve these
The three approaches I can think of are: A. Add direct convenience APIsThis is the approach from #6049. Basically, we just add convenience APIs that quack like the ICU4C ones, at least over FFI and potentially in ICU4X Rust code too if it makes sense. This would mean:
I don't particularly like our options for naming the methods and I also think they're going to be clutter, but especially for the latter two if we do them only over FFI we're probably fine? B. Formalize Borrowed types over FFIDiplomat is able to handle C. Separate "singletons" kitchen sink FFI APIWe can have a separate kitchen sink type that contains selected APIs that match ICU4C's free functions. Could be a single type, or multiple in its own namespace or something. Thoughts? @sffc @robertbastian @zbraniecki |
This and the other issue are in 2.0-stretch because Bionic likely needs them and it may involve major API changes. |
Convenience functions for specific use cases seem fine, as long as they are zero-cost and don't propagate data provenance issues. Basically, the places where we've approved Default impls are the ones where these types of convenience wrappers could be added. In Rust, I prefer generally sticking with the model we've been using and not having two ways of doing the same thing, but we could add these convenience functions especially when they improve discoverability. Maybe there are cases where we can replace compiled data constructors with these helper functions in order to reduce API complexity. (Buffer provider would still need the data-forward API.) Naming Idea: CaseMapper::default_to_uppercase |
Seems like an easy enough litmus test. I think "does ICU4C have a free function" is also a good litmus test that produces a slightly more conservative answer.
Interesting idea, though I don't know if I like removing the symmetry.
Not bad.
I think this points to a solution of picking route A: convenience functions, and only adding them to FFI except in the cases where we add Rust APIs for #6049, and in those cases we make sure they're similarly named. I like that; I don't think Rust needs additional CaseMap APIs. |
See discussion #6048
Closely related to #6049, which talks about this with a focus on enumerated properties.
Generally with ICU4X we have two rough categories of APIs: locale-sensitive (like datetime), and locale-invariant or "singleton" (unicode properties, casemapping, etc). By and large, data loading is super important for the first one, however for most "singleton" APIs the average user will just want the default.
This usage is reflected in how the corresponding APIs are exposed in ICU4C: most "singleton" APIs are free functions like
toLower
andgetIntProperty
, unlikeDateFormat
which is an object you create and hold on to.In turn, this affects how ICU4C users structure their code: code dealing with Unicode Properties will just have a bunch of scattered free function calls.
On the other hand, in ICU4X, these APIs still dangle off of a dataful type, like
CodePointSetData
orCaseMapper
, but there is a zero-cost1 const constructor that gives you the singleton compiled data version, often producing aFooBorrowed
type.This is fine in Rust, just sometimes tedious, where the ICU4C pattern of "call
toUpper()
" becomes the slightly uglierCaseMapper::new().to_simple_uppercase()
, andisAlnum(c)
becomesCodePointSetData::alnum().contains(c)
It's a problem over FFI, however, because
CaseMapper::new()
/CodePointSetData::alnum()
over FFI will unconditionally allocate aCaseMapper
/CodePointSetDataBorrowed
object, which will then have to be freed.This is likely to be a prohibitively more costly change for people wishing to migrate from ICU4C to ICU4X.
We should try and fix this. #6049 will fix this for enumerated properties, but not other APIs. I'll write a followup comment with approaches.
Footnotes
Depends on the type. CaseMapper doesn't have a Borrowed variant so it's not 100% zero cost there (because the destructor may still get called if not DCEd). Maybe it should. ↩
The text was updated successfully, but these errors were encountered: