Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Singleton" APIs are currently wasteful over FFI #6050

Open
Manishearth opened this issue Jan 31, 2025 · 4 comments
Open

"Singleton" APIs are currently wasteful over FFI #6050

Manishearth opened this issue Jan 31, 2025 · 4 comments
Labels
discuss-priority Discuss at the next ICU4X meeting U-android-bionic Bionic in Android

Comments

@Manishearth
Copy link
Member

See discussion #6048

Closely related to #6049, which talks about this with a focus on enumerated properties.

Generally with ICU4X we have two rough categories of APIs: locale-sensitive (like datetime), and locale-invariant or "singleton" (unicode properties, casemapping, etc). By and large, data loading is super important for the first one, however for most "singleton" APIs the average user will just want the default.

This usage is reflected in how the corresponding APIs are exposed in ICU4C: most "singleton" APIs are free functions like toLower and getIntProperty, unlike DateFormat which is an object you create and hold on to.

In turn, this affects how ICU4C users structure their code: code dealing with Unicode Properties will just have a bunch of scattered free function calls.

On the other hand, in ICU4X, these APIs still dangle off of a dataful type, like CodePointSetData or CaseMapper, but there is a zero-cost1 const constructor that gives you the singleton compiled data version, often producing a FooBorrowed type.

This is fine in Rust, just sometimes tedious, where the ICU4C pattern of "call toUpper()" becomes the slightly uglier CaseMapper::new().to_simple_uppercase(), and isAlnum(c) becomes CodePointSetData::alnum().contains(c)

It's a problem over FFI, however, because CaseMapper::new()/CodePointSetData::alnum() over FFI will unconditionally allocate a CaseMapper/CodePointSetDataBorrowed object, which will then have to be freed.

This is likely to be a prohibitively more costly change for people wishing to migrate from ICU4C to ICU4X.

We should try and fix this. #6049 will fix this for enumerated properties, but not other APIs. I'll write a followup comment with approaches.

Footnotes

  1. Depends on the type. CaseMapper doesn't have a Borrowed variant so it's not 100% zero cost there (because the destructor may still get called if not DCEd). Maybe it should.

@Manishearth Manishearth added discuss-priority Discuss at the next ICU4X meeting U-android-bionic Bionic in Android labels Jan 31, 2025
@Manishearth
Copy link
Member Author

Manishearth commented Jan 31, 2025

None of the approaches I can think of here are great.

Firstly, here's an overview of the primary APIs where the ICU4C version uses free functions:

  • All property getter APIs (binary, enumerated, script, scx)
  • All property name APIs (parsing and serializing)
  • Case mapper APIs

There are more APIs we could in theory also remove the allocation cost from, however these are more expensive anyway so it might not be worth it. We should think about these during this discussion, but they're not my focus and I don't particularly care if we don't solve these

  • Normalizer
  • Segmenter
  • Timezone?

The three approaches I can think of are:

A. Add direct convenience APIs

This is the approach from #6049. Basically, we just add convenience APIs that quack like the ICU4C ones, at least over FFI and potentially in ICU4X Rust code too if it makes sense. This would mean:

I don't particularly like our options for naming the methods and I also think they're going to be clutter, but especially for the latter two if we do them only over FFI we're probably fine?

B. Formalize Borrowed types over FFI

Diplomat is able to handle -> &'static Foo, so we could in theory expose Borrowed types over FFI. This would either involve duplicating the APIs or moving all APIs to borrowed types (which in turn would pessimize JS and Dart due to lifetimes).

C. Separate "singletons" kitchen sink FFI API

We can have a separate kitchen sink type that contains selected APIs that match ICU4C's free functions. Could be a single type, or multiple in its own namespace or something.

Thoughts? @sffc @robertbastian @zbraniecki

@Manishearth
Copy link
Member Author

This and the other issue are in 2.0-stretch because Bionic likely needs them and it may involve major API changes.

@sffc
Copy link
Member

sffc commented Jan 31, 2025

Convenience functions for specific use cases seem fine, as long as they are zero-cost and don't propagate data provenance issues. Basically, the places where we've approved Default impls are the ones where these types of convenience wrappers could be added.

In Rust, I prefer generally sticking with the model we've been using and not having two ways of doing the same thing, but we could add these convenience functions especially when they improve discoverability.

Maybe there are cases where we can replace compiled data constructors with these helper functions in order to reduce API complexity. (Buffer provider would still need the data-forward API.)

Naming Idea: CaseMapper::default_to_uppercase

@Manishearth
Copy link
Member Author

Basically, the places where we've approved Default impls are the ones where these types of convenience wrappers could be added.

Seems like an easy enough litmus test. I think "does ICU4C have a free function" is also a good litmus test that produces a slightly more conservative answer.

Maybe there are cases where we can replace compiled data constructors with these helper functions in order to reduce API complexity. (Buffer provider would still need the data-forward API.)

Interesting idea, though I don't know if I like removing the symmetry.

Naming Idea: CaseMapper::default_to_uppercase

Not bad.

In Rust, I prefer generally sticking with the model we've been using and not having two ways of doing the same thing, but we could add these convenience functions especially when they improve discoverability.

I think this points to a solution of picking route A: convenience functions, and only adding them to FFI except in the cases where we add Rust APIs for #6049, and in those cases we make sure they're similarly named. I like that; I don't think Rust needs additional CaseMap APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss-priority Discuss at the next ICU4X meeting U-android-bionic Bionic in Android
Projects
None yet
Development

No branches or pull requests

2 participants