Applications that use multiple threads can benefit greatly in terms of performance by specifying the processors which should be available to it. By default, the OS scheduler can do an OK job of this, however if you know how you’re using threads, being able to specify which processors to run on can significantly increase performance. Let’s consider an example:
You have an application that has two threads sharing one or more variables. The shared data may be as simple as some flag to keep them in sync. If we run them on a single processor, it can only run one thing at a time. As such, one thread will run for a quantum, context switch, another thread will run for a quantum and so on. [Throughout this post, when I say “processor” I mean one of the processors available to the OS. So if you have hyperthreading enabled, and have two physical cores on a single hardware processor, your OS will see two processors. In other words, in this post, a processor refers to the things you can see Resource Monitor.] That’s not great for performance if both threads are running in lock-step. Another option is to run them on two separate physical cores. However, this will add latency as the shared data has to go across L3 cache, and deal with cache coherency issues. While better than the single processor with each thread waiting for a context switch in order to proceed. If we have a hyper threading enabled core though (i.e. two OS processors that are physically on the same core), then they can use the same cache for the shared data. Cache misses are less likely to happen and things won’t need to go across L3 cache anymore (for the shared variable at least). This will be considerably faster than the other options. How do we tell the OS to do that though?
In Linux, we can use the taskset –c 0,1 [executable] to run the [executable] on processors 0 and 1. That’s quite well known. On Windows, taskset doesn’t exist. However, we can achieve the same thing with our humble start:
> start /affinity 0x3 /b /wait [executable]
will do the same job. Here /b means the executable runs in the same window, /wait means we wait for the executable to complete, and /affinity specifies the processors that are available for the executable to run on. What’s the 0x3, you ask?
The parameter after /affinity is a value stating which processors should be available. It’s a bit vector where each bit represents a processor available to the OS. If we wish to run on only processor 1, we can use 0x1 (binary 0001). If we wish to run on processor 2, then we can use 0x2 (binary 0010). If we wish to run on processor 3, we can use 0x4 (binary 0100). If we wish to run on processors 1 and 2 (which is the case for a core with two processors due to hyperthreading), then we can use 0x3 (binary 0011).
It really is as simple as that