4 Tips for Squishing Software Bugs

Although people have developed software for decades, it remains a slow, painstaking, and error-filled process. Despite all that effort, most software is riddled with bugs. (There are exceptions, however: take the software that powered the space shuttle, versions of which had only one error in 420,000 lines of code.) In other words, no matter how good your coding team, at some point you’re going to need to debug your work. Here are some tips for squishing bugs:

Avoid Stupid Bugs

If you don’t want to engage in a massive bug hunt at the very end of a project (or after a user complains about something very wrong), make sure to implement a process at the beginning of production that will (hopefully) save you from big errors. It also pays to review code on a regular basis, which may (or may not) be part of your Agile workflow. For simplicity’s sake, I’m going to offer an example from 32 years ago, when I was working on a short bit of 6502 assembly code running on a CBM-64, developed using a cartridge-based assembler. When the machine code executed, it rebooted the computer. This is the assembly code, which features two instructions:

lda 16

jsr clr

To put things simply, it didn’t work, and it took me some time to figure out why (in my defence, I only had a year’s worth of experience in 6502 assembly language at the time). The program source had to be completely reloaded from tape after each crash, which slowed things down quite a bit. And the error, as it turned out, was quite straightforward. Here's the fixed version:

lda 16

jsr cls

It was sheer stupidity to mix up “clr” and “cls,” but, in my defence, the cartridge assembler only allowed one source file in memory at once. Because of source-code memory limitations, the program was split into multiple source files, with the main program calling subroutines in each of the other files. Each of those other files had a jump table at the start of each file to its subroutines. The main program had a copy of all of the other files’ jump tables, as well as the physical address where each file’s machine code was loaded into RAM. While that architecture might sound awkward, it also made it easy to build; and changes to one file usually didn’t require rebuilding any others. The “clr” routine was located in a file that had a missing entry from its jump table, compared to the copy in the main file, so it was built incorrectly; the “clr” jumped into code at an address that held data, and led to the reboot. The correct “cls” routine was in a file with the correct number of entries in the jump table. Having two similar named subroutines didn’t help, either. With all that in mind, what are some good ways to avoid messy errors? Coding in as straightforward a manner as possible always helps; if you make your code too complicated, you will not only generate bugs, but also make it that much harder to debug. When in doubt, relying on existing libraries and code snippets is another huge help; depending on the language used, chances are good that lots of developers have already picked over those materials, reducing the chances of critical errors. Effective library use can save you lots of time and effort. And when arranging your development schedule, make sure to build in enough time to do things right, which includes having colleagues review your code. While rushed production timelines sometimes make this sort of “flex time” impossible to bake in, fight for it if possible; you can always tell your manager that time set aside early to hunt for bugs will translate into significant savings in money (and user aggravation) later on.

Stack Tracing and Specialized Tools

Most debuggers let you look at the stack trace to see exactly where your code was called from. Depending on the programming language, function call parameters may be passed either through registers or as values on the stack. On 32-bit Intel chips, there are several methods of passing parameters, and the order of parameters and the return address are stored differently. I once wrote some libraries in Delphi with functions that could be called from Excel spreadsheets. The Delphi code had to make calls into Excel via a Excel4V() function, and had to pop a 32-bit value off the stack before it exited with the line asm pop sink;end;. Discovering that without a debugger would have been tricky.

Eresult := Excel4V(xlfCaller,@xres,0,[nil]);

asm pop sink; end; // Never Remove

A logging mechanism such as DTrace can help you probe everything from system calls to virtual memory. You can also engage in unit testing, in which you poke and prod at individual units of source code to see if they are working effectively. Sometimes you note bugs that seem elude the debugger. For those types of errors, one can rely on a variety of specialized tools, including print statements, logging or even silently outputting debug information. For instance, Windows supports OutputDebugString(string msg) for C/C++ and .NET/C#. You can build it into your program; if there's no debugger attached, nothing happens. Here's the code for a very small program that counts to 100 in steps of 2 with a three-quarter second delay outputting a period on the console for each loop. It also outputs the value using System.Diagnostics.Debug.WriteLine(i), which sends it to the debugger output. I've run debugview, an older, specialized Windows program for monitoring debug output, to capture it:

using System.Threading;

namespace debug

{

class Program

{

static void Main(string[] args)

{

for (var i = 0; i < 100; i += 2)

{

if (i%2 == 0)

{// even

System.Diagnostics.Debug.WriteLine(i);

Thread.Sleep(750);

System.Console.Write('.');

}

}

}

}

}

A client reported that the latest version of a program crashed during startup. I wrote a small function (log(int num)), added log(1), log(2) calls (and so on) after each statement in the startup code, and sent that version to the client. The code wrote a number into a text file. After it crashed, I had the client view the file and tell me the number in order to identify the last statement before everything crashed.

Don't Assume!

Don't assume you know the nature of bugs without doing some research first. My worst-ever bug occurred in a Delphi 3 program that read financial data from an Access database, passed it to a third-party DLL for calculations, and then stored it. One day it started crashing in the oddest way. It would fail either at a trunc() operation or a database-accessing instruction and kill the program. Even adding exception handling code around the crash point did not work. I spent three weeks on and off chasing that bug down, all to no avail. To make things worse, it didn't always crash. As this was a research project, we tolerated it, but it still drove me nuts. Three months later, I was stepping through the debugger, examining the values returned from the third-party DLL somewhere else in the program, and noticed that, in a large array (5,000 x 10 doubles) it returned, there were several –INF values. Sure enough, a bug in the third-party DLL was sometimes storing -INF (negative infinities) in the array instead of valid numeric values. Most significantly, it was not generating divide-by-zero exceptions. I found out later that the Access Jet Engine driver for Delphi silently switched off FPU exceptions, so an -INF would happen and somehow affect the processor (i.e., started a time-bomb). When the code tried to do a SQL operation or a trunc(), the time-bomb went off, usually far in the code from where it was created. It was a very unusual bug that I'd quite reasonably assumed was in the recently-executed code, so no wonder I couldn't find it! (I'd also trusted the third-party DLL writer, who was sternly reprimanded.) Always check your inputs, and never make assumptions. That mentality alone can help you more quickly solve random bugs that pop up.

4 Tips for Squishing Software Bugs

Avoid Stupid Bugs

Stack Tracing and Specialized Tools

Don't Assume!

David Bolton

Related Articles

Software Architecture Must Accommodate Change

Tesla Will Pay You to Hunt Bugs

Lessons From My Toughest Software Bugs