[caption id="attachment_16351" align="aligncenter" width="500"] Hands, typing. Always with the typing.[/caption] When I set out to review how different compilers generate possibly different assembly code (specifically for vectorized and multicore code), I noticed a possible anomaly when comparing two recent versions of the g++ compiler, 4.7 and 4.8. When I mentioned my concerns, at least one user commented that he also had a codebase that ran fine after compiling with 4.6 and 4.7, but not with 4.8. So I decided to explore the difference and see if there was an actual problem.

Regarding Vectorization

Before we get started, I want to address another issue that came up in the Slashdot forums. I'm examining vectorization, but not everybody was clear on what that means. The term “vector” has its root in mathematics, but that's not the kind of vectorization we're talking about here. We're not talking about the vectorization performed by such products as Mathematica. In the world of mathematics, vectors are sets of scalar numbers. You can multiply them, for example, and perform dot products; that’s all part of Linear Algebra. But in the world of computer processor architecture, vectorization has a different meaning: it’s when you pack multiple values into a single hardware register, and perform arithmetic and other operations on all numbers in that register simultaneously with a single assembly instruction (the relevant term is “SIMD,” which stands for Single Instructions, Multiple Data). So while products such as Mathematica can perform vector operations, those aren’t the kind of vectors we're talking about here. You can use hardware vectorization to perform mathematical vector operations; in fact, that's an excellent use of them. But it's still a separate topic. Mathematica and other mathematics products have leveraged vector processing for years, but that doesn't automatically mean they use SIMD operations at the assembly level.

Ground Rules

As usual, let's lay some ground rules on what I'm testing. Right now, for the scope of this article only, I'm not interested in code that compiled successfully under 4.7 but doesn't compile under 4.8. People have reported plenty of those problems and they're being tracked in the GCC bug database. (Many of the reports are of internal compiler errors, as opposed to the compiler generating an error message about the code it's compiling.) Remember, like any software, the GCC compiler set is itself a large set of code created by a massive number of developers, and thus has bugs in it. The developers are working on it. Instead, I’m looking at the actual generated assembly code. Suppose you have code that compiles and runs just fine; you upgrade the compiler, and the code still compiles grandly without any changes, but you start to get bugs and problems when your software runs—problems that didn't exist before. That can be a major headache. Your code worked fine before, so should you try to fix it to work with the new compiler? Well, before you can do that, you really need to know what’s actually going on under the hood. And the problems under the hood can take several forms: The assembly code generated by the compiler might be different; or the runtime libraries, which are also a new version, might have a bug in them; or there’s an enhancement of which you were unaware. Either way, you're faced with a decision: Do you revert back to the older compiler, or do you modify your code to make it work with the new compiler? Reverting back to the older compiler is problematic, because then you have to wonder how long you'll be stuck with the old compiler. (I worked at a large company in 2006 that used a compiler on a decade-old project, because its code didn't work with the newer compilers. The compiler was built prior to the ANSI 1998 standard, which caused a lot of code to break. And to get that compiler to work, the C-level tech engineers insisted we install a very old version of an IDE long since abandoned by its manufacturer. That made for an incredibly frustrating situation, as you can imagine.) On the other hand, if you adjust your code to work with the new compiler, then what will happen when the next version comes out? Will any of these possible problems be removed? Or are the “problems” actually enhancements? Many of us lived through this nightmare in 1998 and 1999 when the compiler vendors started upgrading their products to be compliant with the 1998 standard. Adjusting for the compiler changes can cause headaches—but some headaches are avoidable if you work with the changes, rather than against them.

Changes to Optimization

Before we get into the assembly code, let's consider the factor that can influence the generation of the assembly code: optimization. The 4.8 release of GCC includes a new optimization level noted in the command line with -Og. The idea here is to support better debugging and fast compilation with (as the release notes say) “a reasonable level of runtime performance.” (This also addresses an issue Slashdot readers seem to disagree on, and that's whether compilation time is an issue at all. Developers working on a large code base with a large team who pull in changes throughout the day don't want to wait ages for the project to compile just to test out their code. In cases like that, the time it takes to compile is most certainly an issue.) So if your code works with no optimization, does it still work with the optimized code? When it comes to compilers, those that optimize produce assembly code that's different from non-optimized. Whether your code still works or not with these optimizations is up to you to decide, via testing. If your code no longer works right for some reason, you can lower the optimization level. In this test today I'll be looking at the optimizations as well—both to check the optimization itself, but also because that's how you enable autovectorization.

The Test

To perform this test, I started with two clean Linux installations, specifically Ubuntu 13.10 server. On one I installed version 4.8.1 of gcc and g++. On the second, I installed version 4.7.3. Here's the code I was dealing with in the previous article. This is a slightly modified version from the original (found at http://locklessinc.com/articles/vectorize/):
void test6(double * __restrict__ a, double * __restrict__ b) {

            size_t i, j;

            double *x = (double *)__builtin_assume_aligned(a, 16);

            double *y = (double *)__builtin_assume_aligned(b, 16);

 for (j = 0; j < SIZE; j++)


                        for (i = 0; i < SIZE; i++)


                                    x[i + j * SIZE] += y[i + j * SIZE];



(The modification is to make it work with the C++ compiler. The original report used C, rather than C++.) When I turn the optimization to level 3, I get a pretty lengthy amount of code, whether I use the 4.7 or 4.8 compiler. On the surface, this code appears quite different. Due to space (and my unwillingness to bore readers any more than necessary), I won't reproduce the resulting assembly code here. But I will say the 4.8 version, for just the loops, is about ten lines shorter. In both cases the code is vectorized. The vectorized portion, which is basically this line of C++ code—
 x[i + j * SIZE] += y[i + j * SIZE];
—is almost the same, except for a minor difference in how the data is moved in and out of the registers. (The 4.7 version uses two registers; the 4.8 version uses only 1.) The rest of the difference centers on how the loop is optimized. Now remember: The code runs in both cases. It doesn't have a bug. What we're dealing with here, then, is a matter of the developers revising the assembly code generation and optimizing algorithms. Nevertheless, the code is different. When I turned off optimizations, I ended up with code that was almost identical, except for two lines of assembler where the 4.8 used a slightly tighter method of comparing if one number is less than another. In other words, they were virtually the same. What does this mean for us? The GCC developers are continuously updating their code, including the optimizations. What we got in 4.8 is considerably different (re: dealing with the loops); in this case, it works.

More Problems in the Wild

Although this quick test code ran fine, as I mentioned earlier, some readers have experienced problems with version 4.8. We can see that the two versions produce different code when optimization is turned on. It didn't take me long to find many other issues people have encountered over the past couple of months with 4.8. This reader on Stackoverflow, for example, had a problem with the optimizations not working out as they had in the 4.7 version of gcc. So what does this mean?


Does this mean 4.8 is flawed, or that you shouldn't use it? Not at all. You can certainly use 4.8. Most programmers I've worked with over the years in applications development are not experts in assembly code, and they don't want to be experts in it. (That's why they're using higher-level languages and creating applications, as opposed to, for example, embedded systems.) If a bug comes in, most of them aren’t going to blame the compiler, and if they do, coworkers will likely scoff at them. Blaming the compiler is the usually considered the ultimate in taking responsibility for your own coding mistakes. Furthermore, most programmers who are at least halfway decent at their job don't even think to blame the compiler. We've been taught to trust the compilers, and we should continue to trust the compilers. The compiler should not be the first thing we attack when a bug comes in. And the last thing we want to do is spend hours upon hours tracing through assembly code—that’s not anybody’s idea of a good time. But every once in awhile, we may have to embark on just that sort of arduous task. As with the reader whose software works fine when run after compilation with 4.6 and 4.7, but crashes when run after being compiled with 4.8, sometimes you need to get down and dirty to solve the problem: isolate the code that's crashing, see if it can be reproduced in a small test program (if possible, under a couple dozen lines of code), and demonstrate that it runs correctly when compiled with 4.7 but crashes with 4.8. Then turn off all optimizations. Does it still crash? If so, turn on the -S switch to see the assembly code and compare the versions. Is there a problem in there? Then it might be time to submit a bug report to the GCC team. (You're helping the community in doing so. The team needs to know about the bugs so they can fix them.) If and when they fix the bug, go ahead and use the latest version so you're not stuck coding for an old compiler. Version 4.8 is a work in progress, like any piece of software. It has bugs, but the team is working hard to fix them. This is the first version of the compiler that was built using the C++ compiler, so there will be issues. But as of this writing, version 4.8.2 is already out. (My tests were with 4.8.1.) Move forward with it, and you'll most likely be fine.   Image: mama_mia/Shutterstock.com