And rust also has the “🤦”.chars().count() which returns 1.
I would rather argue that rust should not have a simple len function for strings, but since str is only a byte slice it works that way.
Also also the len function clearly states:
This length is in bytes, not chars or graphemes. In other words, it might not be what a human considers the length of the string.
That Rust function returns the number of codepoints, not the number of graphemes, which is rarely useful. You need to use a facepalm emoji with skin color modifiers to see the difference.
The way to get a proper grapheme count in Rust is e.g. via this library: https://crates.io/crates/unicode-segmentation
Makes sense, the code-points split is stable; meaning it’s fine to put in the standard library, the grapheme split changes every year so the volatility is probably better off in a crate.
Yeah, although having now seen two commenters with relatively high confidence claiming that counting codepoints ought be enough…
…and me almost having been the third such commenter, had I not decided to read the article first…
…I’m starting to feel more and more like the stdlib should force you through all kinds of hoops to get anything resembling a size of a string, so that you gladly search for a library.
Like, I’ve worked with decoding strings quite a bit in the past, I felt like I had an above average understanding of Unicode as a result. And I was still only vaguely aware of graphemes.