I’ve started noticing articles and YouTube videos touting the benefits of branchless programming, making it sound like this is a hot new technique (or maybe a hot old technique) that everyone should be using. But it seems like it’s only really applicable to data processing applications (as opposed to general programming) and there are very few times in my career where I’ve needed to use, much less optimize, data processing code. And when I do, I use someone else’s library.

How often does branchless programming actually matter in the day to day life of an average developer?

38 points

If you want your code to run on the GPU, the complete viability of your code depend on it. But if you just want to run it on the CPU, it is only one of the many micro-optimization techniques you can do to take a few nanoseconds from an inner loop.

The thing to keep in mind is that there is no such thing as “average developer”. Computing is way too diverse for it.

permalink
report
reply
20 points

And the branchless version may end up being slower on the CPU, because the compiler does a better job optimizing the branching version.

permalink
report
parent
reply
7 points
*

If you want your code to run on the GPU, the complete viability of your code depend on it.

Because of the performance improvements from vectorization, and the fact that GPUs are particularly well suited to that? Or are GPUs particularly bad at branches.

it is only one of the many micro-optimization techniques you can do to take a few nanoseconds from an inner loop.

How often do a few nanoseconds in the inner loop matter?

The thing to keep in mind is that there is no such thing as “average developer”. Computing is way too diverse for it.

Looking at all the software out there, the vast majority of it is games, apps, and websites. Applications where performance is critical, such as control systems, operating systems, databases, numerical analysis, etc, are relatively rare compared to apps/etc. So statistically speaking the majority of developers must be working on the latter (which is what I mean by an “average developer”). In my experience working on apps there are exceedingly few times where micro-optimizations matter (as in things like assembly and/or branchless programming as opposed to macro-optimizations such as avoiding unnecessary looping/nesting/etc).

Edit: I can imagine it might matter a lot more for games, such as in shaders or physics calculations. I’ve never worked on a game so my knowledge of that kind of work is rather lacking.

permalink
report
parent
reply
23 points
*

Or are GPUs particularly bad at branches.

Yes. GPUs don’t have per-core branching, they have dozens of cores running the same instructions. So if some cores should run the if branch and some run the else branch, all cores in the group will execute both branches, and mask out the one they shouldn’t have run. I also think they they don’t have the advanced branch prediction CPUs have.

https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads

permalink
report
parent
reply
4 points

Makes sense. The most programming I’ve ever done for a GPU was a few simple shaders for a toy project.

permalink
report
parent
reply
14 points

How often do a few nanoseconds in the inner loop matter?

It doesn’t matter until you need it. And when you need it, it’s the difference between life and death

permalink
report
parent
reply
3 points

Also if you branch on a GPU, the compiler has to reserve enough registers to walk through both branches (handwavey), which means lower occupancy.

Often you have no choice, or removing the branch leaves you with just as much code so it’s irrelevant. But sometimes it matters. If you know that a particular draw call will always use one side of the branch but not the other, a typical optimization is to compile a separate version of the shader that removes the unused branch and saves on registers

permalink
report
parent
reply
3 points

How often do a few nanoseconds in the inner loop matter?

Fintech. Stock exchanges will go to extreme lengths to appease their wolves of Wallstreet.

permalink
report
parent
reply
1 point

Yes GPUs are bad at branching. But my ray tracer that is made of 90% branches still runs faster on the GPU than the CPU.

In general you are still correct.

permalink
report
parent
reply
16 points

The better of those articles and videos also emphasize you should test and measure, before and after you “improved” your code.

I’m afraid there is no standard, average solution. You trying to optimize your code might very well cause it to run slower.

So unless you have good reasons (good as in ‘proof’) to do otherwise, I’d recommend to aim for readable, maintainable code. Which is often not optimized code.

permalink
report
reply
5 points

One of the reasons I love Go is that it makes it very easy to collect profiles and locate hot spots.

The part that seems weird to me is that these articles are presented as if it’s a tool that all developers should have in their tool belt, but in 10 years of professional development I have never been in a situation where that kind of optimization would be applicable. Most optimizations I’ve done come down to: I wrote it quickly and ‘lazy’ the first time, but it turned out to be a hot spot, so now I need to put in the time to write it better. And most of the remaining cases are solved by avoiding doing work more than once. I can’t recall a single time when a micro-optimization would have helped, except in college when I was working with microcontrollers.

permalink
report
parent
reply
5 points

Given the variety of software in existence I think it’s hard to say that something is so universally essential. Do people writing Wordpress plugins need to know about branch prediction? What about people maintaining that old .NET 3.5 application keeping the business running? VisualBasic macros?

I agree it’s weird. Probably more about getting clicks/views.

permalink
report
parent
reply
3 points

Please please please, God, Allah, Buddha, any god or non god out there, please don’t let any engineer bringing up branchless programming for a AWS lambda function in our one-function-per-micro-service f*ckitechture.

permalink
report
parent
reply
2 points

Exactly, this sounds like a good way to optimize prematurely…

permalink
report
parent
reply
15 points

It matters if you develop compilers 🤷,

Otherwise? Readability trumps the minute performance gain almost every time (and that’s assuming your compiler won’t automatically do branchless substitutions for performance reasons anyway which it probably will)

permalink
report
reply
13 points

Not that much - it’s useful when you have a very hot loop where branches can cause the branch predictor to guess wrong and have to rollback computation it already did unnecessarily.

This StackOverflow question explains it fairly well: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array

permalink
report
reply
8 points

I understand the principles, how branch prediction works, and why optimizing to help out the predictor can help. My question is more of, how often does that actually matter to the average developer? Unless you’re a developer on numpy, gonum, cryptography, digital signal processing, etc, how often do you have a hot loop that can be optimized with branchless programming techniques? I think my career has been pretty average in terms of the projects I’ve worked on and I can’t think of a single time I’ve been in that situation.

I’m also generally aggravated at what skills the software industry thinks are important. I would not be surprised to hear about branchless programming questions showing up in interviews, but those skills (and algorithm design in general) are irrelevant to 99% of development and 99% of developers in my experience. The skills that actually matter (in my experience) are problem solving, debugging, reading code, and soft skills. And being able to write code of course, but that almost seems secondary.

permalink
report
parent
reply
8 points

I’ve never had to care about it in 16 years of coding. I’ve also seen a few absolutely horrifying code designs in the name of being branchless. Code readability is often way more important than eeking out every bit of compute out of a CPU. And it gets in a domain where architecture matters too: if you’re coding for a microprocessor or some low power embedded ARM processor, those don’t even have branch predictors so it’s a complete waste of time

I’d say, being able to identify bottlenecks is what really matters, because it’s what will eventually lead you to the hot loop you’ll want to optimize.

But the overwhelming majority of software is not CPU bound, it’s IO bound. And if it is CPU bound, it’s relatively rare that you can’t just add more CPUs to it.

I do get your concern however, these interview questions are the plague and usually asked by companies with zero need for it. Personally I pass on any job interview that requires some LeetCode exercises. I know my value and my value isn’t remembering CS exercises from 10 years ago. I’ll absolutely unfuck your webserver or data breach at 3am though. Frontend, backend, Linux servers, cloud infrastructure, databases, you name it, I can handle it no problem.

permalink
report
parent
reply
4 points

Code readability is often way more important

This. 100% this. The only thing more important than readability is whether it actually works. If you can’t read it, you can’t maintain it. The only exception is throw away scripts I’m only going to use a few times. My problem is that what I find readable and what the other developers find readable are not the same.

I’d say, being able to identify bottlenecks is what really matters, because it’s what will eventually lead you to the hot loop you’ll want to optimize.

I love Go. I can modify a program to activate the built-in profiler, or throw the code in a benchmark function and use the tool chain to profile it, then have it render a flame graph that shows me exactly where the CPU is spending its time and/or what calls are allocating. It makes it so easy (most of the time) to identify bottlenecks.

permalink
report
parent
reply
2 points

(Branchless can technically be faster on CPUs without branch prediction, due to pipelines stalling from branches, but it’s still a waste of time unless you’ve actually identified it as a bottleneck)

permalink
report
parent
reply
3 points

Personally I try to keep my code as free of branches as possible for simplicity reasons. Branch-free code is often easier to understand and easier to predict for a human. If your program is a giant block of if statements it’s going to be harder to make changes easily and reliably. And you’re likely leaving useful reusable functionality gunked up and spread out throughout your application.

Every piece of software actually is a data processing pipeline. You take some input, do some processing of some sort, then output something, usually along with some side effects (network requests, writing files, etc). Thinking about your software in this way can help you design better software. I rarely write code that needs to process large amounts of data, but pretty much any code can benefit from intentional simplicity and design.

permalink
report
parent
reply
7 points

I am all aboard the code readability train. The more readable code is, the more understandable and therefore debuggable and maintainable it is. I will absolutely advocate for any change that increases readability unless it hurts performance in a way that actually matters. I generally try to avoid nesting ifs and loops since deeply nested expressions tend to be awful to debug.

This article has had a significant influence on my programming style since I read it (many years ago). Specifically this part:

Don’t indent and indent and indent for the main flow of the method. This is huge. Most people learn the exact opposite way from what’s really proper — they test for a correct condition, and if it’s true, they continue with the real code inside the “if”.

What you should really do is write “if” statements that check for improper conditions, and if you find them, bail. This cleans your code immensely, in two important ways: (a) the main, normal execution path is all at the top level, so if the programmer is just trying to get a feel for the routine, all she needs to read is the top level statements, instead of trying to trace through indention levels figuring out what the “normal” case is, and (b) it puts the “bail” code right next to the correctness check, which is good because the “bail” code is usually very short and belongs with the correctness check.

When you plan out a method in your head, you’re thinking, “I should do blank, and if blank fails I bail, but if not I go on to do foo, and if foo fails I should bail, but if not i should do bar, and if that fails I should bail, otherwise I succeed,” but the way most people write it is, “I should do blank, and if that’s good I should do foo, and if that’s good I should do do bar, but if blank was bad I should bail, and if foo was bad I should bail, and if bar was bad I should bail, otherwise I succeed.” You’ve spread your thinking out: why are we mentioning blank again after we went on to foo and bar? We’re SO DONE with blank. It’s SO two statements ago.

permalink
report
parent
reply
13 points
*

Necessary for cryptographic code, where data-dependent branches can create a side-channel which leaks the data.

permalink
report
reply
2 points

I thought it might be helpful for optimizing cryptographic code, but it hadn’t occurred to me that it would prevent side channel leaks

permalink
report
parent
reply

Programming

!programming@programming.dev

Create post

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you’re posting long videos try to add in some form of tldr for those who don’t want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



Community stats

  • 3.5K

    Monthly active users

  • 1.6K

    Posts

  • 26K

    Comments