Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods to iterate exemplar characters #6079

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lionel-rowe
Copy link

Adds public code_point_ranges and strings methods to ExemplarCharacters to allow iterating through the single- and multi-character entries, respectively:

  • code_point_ranges returns a CodePointRangeIterator (existing struct)
  • strings returns a StringIterator, a new struct that iterates via passing a DiplomatWrite buffer to its next method, writing a null-char-terminated string for Some(string) and an empty string for None (thus differentiating Some("") -> "\0" from None -> "", though in practice Some("") should never occur for exemplar chars). Not sure if there's a better way of iterating strings across the FFI boundary.

Thus for hu locale and Main set:

# `code_point_ranges`
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'v', 'z', 'á', 'é', 'í', 'ó', 'ö', 'ú', 'ü', 'ő', 'ű'
# `strings`
"ccs", "cs", "ddz", "ddzs", "dz", "dzs", "ggy", "gy", "lly", "ly", "nny", "ny", "ssz", "sz", "tty", "ty", "zs", "zzs"

I also added a tab to the WASM demo to illustrate how these can be used from JS/TS.

@lionel-rowe lionel-rowe requested review from Manishearth and a team as code owners February 7, 2025 11:21
@lionel-rowe lionel-rowe force-pushed the exemplar-characters-iteration branch from ecf4338 to e3cf37f Compare February 7, 2025 11:22
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good! Curious: what are you using ICU4X-JS for? So far we've been getting C++ users, excited to see some uptake in the JS backend!

impl StringIterator {
/// Advance the iterator by one and return the next string, terminated with a null byte.
/// If there are no more strings to be iterated, an empty string is returned.
pub fn next(&mut self, write: &mut DiplomatWrite) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Diplomat supports iterators, I suggest tacking on #[diplomat::attr(auto, iterator)] and having this return an Option<()>.

We should probably make sure all other iterable APIs in icu_capi use iterators too.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't add a generic stringiterator type IMO, it's going to do expensive intermediate allocations.

/// An iterator over strings
#[diplomat::opaque]
pub struct StringIterator(
pub Box<dyn Iterator<Item = String>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this should reference a concrete type, and borrow from it

FnInStruct
)]
pub fn strings(&self) -> Box<StringIterator> {
let strings = self.0.as_borrowed().strings().iter().map(|s| s.to_string()).collect::<Vec<_>>();
Copy link
Member

@Manishearth Manishearth Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: these two should not collect, Diplomat can return borrows. This should return a Box<ExemplarCharactersStringIterator<'a>>

Ideally there's no dynamic dispatch, if we need concrete iterator types in ICU4X (where we have `impl Trait) we should add them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants