17 points

Cross-posting from reddit:

The PR has more details, but here are a few ad hoc benchmarks using ripgrep on my M2 mac mini while searching a 5.5GB file.

This one is just a case insensitive search. A case insensitive regex expands to something like (ignoring Unicode) [Ss][Hh][Ee][Rr]..., which means that it has multiple literal prefixes. In fact, you can enumerate them! As long as the set is small enough, this is something that the new SIMD acceleration on aarch64 can handle (and has done for a long time on x86-64):

$ time rg-before-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
3055

real    8.208
user    7.731
sys     0.467
maxmem  5600 MB
faults  191

$ time rg-after-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
3055

real    1.137
user    0.695
sys     0.430
maxmem  5904 MB
faults  203

And of course, using multiple literals explicitly also uses this optimization:

$ time rg-before-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
3804

real    9.055
user    8.580
sys     0.474
maxmem  4912 MB
faults  11

$ time rg-after-teddy-aarch64 -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2018.half.en
3804

real    1.121
user    0.697
sys     0.422
maxmem  4832 MB
faults  11

And it doesn’t just work for prefixes, it also works for inner literals too:

$ time rg-before-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en
773

real    9.065
user    8.586
sys     0.477
maxmem  6384 MB
faults  11

$ time rg-after-teddy-aarch64 -c '\w+\s+(Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)\s+\w+' OpenSubtitles2018.half.en

773

real    1.124
user    0.702
sys     0.421
maxmem  6784 MB
faults  11

If you’re curious about how the SIMD stuff works, you can read my description of Teddy here. I ported this algorithm out of the Hyperscan project several years ago, and it has been one of the killer ingredients for making ripgrep fast in a lot of common cases. But it only worked on x86-64. With the rise and popularity of aarch64 and Apple silicon, I was motivated to port it over. I just recently finished analogous work for the memchr crate as well.

permalink
report
reply
4 points

This sounds really great and will probably have quite an impact on a lot of users. So, nice work!

permalink
report
parent
reply

Rust

!rust@programming.dev

Create post

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

!performance@programming.dev

Credits
  • The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

Community stats

  • 598

    Monthly active users

  • 775

    Posts

  • 3.3K

    Comments