2 posts

Archive for June, 2011


The Evolution of Static Code Analysis – Part 3: The Present Day

Posted by Todd Landry   June 8th, 2011

My first 2 posts looked at 2 different eras of Static Code Analysis, the Early Years and the Early 21st Century. The SCA solutions of these times were revolutionary, and helped software development teams a great deal. But they had their warts.

In the final post in this series, I’m going to introduce you to the present day Static Code Analysis technology and how it is impacting developers.

The Present Day

I’m a huge fan of Reece’s Peanut Butter Cups. I love them. I keep active so I don’t feel guilty eating them. In a strange, convoluted way, the 3rd generation of static code analysis tools are like this delicious combination of chocolate and peanut butter. Let me explain.

I’m sure you remember from my previous posts how the 1st generation tools (i.e. Lint) gave questionable results but was still considered by developers as a tool exclusively for them, and the 2nd generation tools gave really good results but moved away from being a developer tool.
The 3rd generation tools recognized that the developer must be an integral part of the process of identifying, fixing and preventing bugs from reaching the code stream and so, they took the proven results from the 2nd gen tools and delivered them right to the developer’s desktop.

Eureka! Now developers are able to perform an analysis locally, using their development environment of choice, while still getting the high accuracy and consistency that was previously only possible by checking in their code and waiting for the integration build to take place.

Think about the ramifications of this:

  • cleaner code is being checked in
  • the ‘rinse-repeat’ vicious cycle of rework is drastically reduced
  • quality teams are now able to focus on testing the product’s functionality rather than spending cycles uncovering something that could easily and quickly be found by automated tools.

Mmmm-mmmm good. Sounds like a win-win-win to me!

I think the best thing about these 3rd generation tools is simply the fact that developers are now able to resume ownership of the quality and security of the code they are producing.

Well, I hope you enjoyed this walk down memory lane. I sure did. Now I’m looking for spare change because I see a trip to the vending machine in my immediate future.

If you want to know more about the 3rd Generation tools, feel free to drop me a line.


To report, or not to report…

Posted by Gwyn Fisher   June 6th, 2011

BalanceCreating a source code analysis (SCA) engine is a balancing act, a decision process of where you believe the most value can be found along the spectrum that is the signal-to-noise ratio of the detection process. At one end lies the realm of massive noise and hopefully complete coverage, whilst at the other is the quiet calm of the theoretically useful but ultimately useless realm of no noise, but ultimately no signal either.

That may sound counter-intuitive. Shouldn’t a zero noise point on the spectrum be accompanied by an infinitely strong signal? Perhaps in the world of DSP this is true, but in the world of SCA reducing noise comes right along with a reduction in detection capability – it’s unfortunately almost a straight-line correlation.

So if we assume that we’re trying to balance a couple of dials on our theoretical tuner, we might start by reducing or dampening noise – it’s the most obvious place to start, after all. Nobody likes to listen to their favorite FM station through the curtain of hissing and popping that accompanies the act of driving through a major city.  Likewise no developer likes sifting through a long list of bogus detection errors in order to find the hidden gems. But to drag out the analogy, assume that the only way of reducing hiss on your FM signal is to turn down the volume… now you’ve got less hiss, but also less Bruce Springsteen goodness to accompany it.

Balance is what we need here, obviously. Enough Boss to make us ignore the hiss, or to put it in a more SCA-like context, enough interesting bugs to make us ignore the incorrect, or the irrelevant (correct detections on the part of the engine that the developer just doesn’t care about, e.g. low memory conditions in a memory-insensitive environment).

Consider the following simple example that clearly lies “on the line”:

    void foo(char* s, int a)
    {
        char* s1 = s;
        if( a > 0 )
            *s1 = 'a';   // potentially use an uninitialized ‘s1’
    }

    void bar(int m)
    {
        char *s;
        foo(s, m);       // s is not initialized prior to calling ‘foo’
    }

So… to report, or not to report?

Lacking any other information, it is obvious that function ‘foo’ interacts under certain situations (when parameter ‘a’ is positive) with parameter ‘s’ (aliased as local variable ‘s1’). As we have no knowledge about the provenance of parameter ‘s’ when analyzing ‘foo’, however, there’s nothing here to cause a report and so we squirrel away the knowledge of what ‘foo’ does for later use.

When analyzing ‘bar’ we know what ‘foo’ does, and we know we’ve got an uninitialized local pointer, ‘s’. But again we’re lacking enough knowledge to know the valid values, or ranges, that parameter ‘m’ may take. There are definitely a set of circumstances here in which we know a problem will occur (if parameter ‘m’ is positive), and a set of circumstances in which we know a problem will not occur (if parameter ‘m’ is zero or negative) – this much is encoded in the functional behavior of ‘foo’. But is it a defect, or should we filter out the report in favor of providing only those situations in which we can be “sure” the bug not only exists, but can be proven to be exercised?

There’s the art of balance in a nut-shell, and it revolves around the phrase “lacking any other information.” In the ideal world, lacking any restrictions in terms of time, memory or computing power (or indeed actual from-the-wall power, as we have to worry about now), we might defer all such decisions until we categorically know that a particular data value is passed down the call graph far enough to get to ‘foo’. But in the real world of multi-million LOC projects, that approach simply can’t scale.

And so, calling on balance as our friend, we can bias a localized decision to report or not, given that we know to at least one order of approximation that bad things could happen here. Different engines pronounce that bias differently, leading to one of the greatest divides between prevalent solutions.

Now ask yourself, as the developer, is it a worthy report if you know that 10 levels up the call graph there’s a check on what eventually becomes parameter ‘m’ to ensure that it’s never positive? Perhaps you’d automatically classify this as a false positive and, annoyed at the tool, move onto the next report. Or perhaps, seeing the size of the gap in the call graph, you might just choose to code defensively, initializing ‘s’ to NULL in ‘bar’ and adding guard code to ‘foo’ because, hey, you never know.

And as we’ve all seen so many times over the years, “you never know” might just as well be written “and so it came to pass…”