My friend Stan Melax has written a number of articles on optimization using SSE instructions. He’s also talked about some simple things you can do to make your program run faster. I’d like to reiterate one of these because it’s just so simple to do and it’s such a big payoff. It turns out that the Microsoft C++ compiler falls back to using x87 instructions unless you tell it differently. What you want is to enable SSE instructions in your code. You might be wondering how this will limit your program. Both Intel and AMD’s processors have supported SSE3 instruction since 2004/2005 respectively. So as long as you are targeting a PC made in the last six years there should be no problem in enabling SSE3.
The problem is that the default setting-enable x87 instructions-hearkens back to the day of math coprocessors. In other words, back when a math coprocessor was a separate chip that you could install on your motherboard. The coprocessor was integrated back in the ’486 days, so this bit of legacy isn’t really necessary anymore. You want to make two changes to the default settings to make your calculations faster.
If you doubt this is important the just look at the flak Nvidia caught for its PhysX software implementation. An investigation by David Kanter at Real World Technologies found that Nvidia’s PhysX software implementation for use by CPUs still uses x87 code (plus David says the software implementation seems to only use one thread on multithreaded systems, whereas the hardware version is multithreaded).
Go to the properties dialog. Select C++, then Code Generation. You want to make two changes; set Enable Enhanced Instruction Set, and select Enable Streaming SIMD 2 Extensions. This will enable the SSE instructions. Next you want to select Floating-Point Model, and set it to Fast. This tells the compiler that you want to use 32-bit floats, not doubles;
Here’s a graphic of what it looks like;
Your mileage may vary, but if you do any amount of math in your program this simple modification will greatly speed up your program. If you are wondering about older machines and SSE compatibility, I don’t actually know what will happen if you run this on a PC that doesn’t have the required SSE capabilities – I assume it’ll seg fault and blue screen. While checking CPU capabilities is too big a topic to cover here (perhaps in another post), you can look up the CPUID instruction if you want to puzzle it out. For me, I’d assume that if it’s running WinXP or better, then the SSE instructions will be OK.
I’ll post a link to Stan’s article when it’s up.