34 posts
« Previous 1 / 2 / 3 / 4 Next »

Archive for the ‘Static Analysis’ Category


Answering questions about your code base – Part 1

Posted by Patti Murphy   February 8th, 2012

Static analysis captures the current state of your code base and helps you answer key questions about the quality, security and maintainability of your software project.

Think Magic 8 Ball with build omniscience and powerful reporting tools. OK, maybe Magic 8 Ball isn’t a good analogy.

Answers to what questions, you ask? One we often hear from customers is: Where do I start?

A good place to start is a report that captures the distribution of defect types from your current build.  For example, we recommend that our customers glance over the Top 10 Issues report in our web-based build reporting tool, Klocwork Review while indulging in their morning cup of coffee:

Magic 8 Ball can't do this. Here's a defect distrubtion view of your build.

With this build snapshot and your caffeine jolt,  you can quickly identify defects of interest to your organization, such as null pointer dereferences and memory leaks. If you wish, you can set up filters (we call views) to show only these defect types in your report.

Your next step is to get your developers using static analysis on their desktops to prevent the injection of these high-priority defects into the build in the first place.

Once a policy of pre-checkin static analysis usage is put in place, pay attention to new defects injected into the build from that point on. If you see a spike in new defects, then investigate.

The magnitude of that y-axis is not what matters most; it’s the overall trend that counts.

For my next post, I’ll take a look at reports that track your cost of ownership and show you what success looks like.


Golden rules of AST checker development

Posted by Patti Murphy   January 24th, 2012

In my previous post, It’s time to create a custom checker…, we looked at the considerations involved in deciding which checker to create–AST or path?

In this post, we’re going to use a custom checker to enforce an internal coding standard that extends the default set of checkers in our source code analysis tool.

To do this, I’ve called upon Steve Howard, our head of Partner Support in Europe, to get us started with an AST checker to accomplish our goal.

Steve has coached many customers through the checker creation process. In his experience, the appeal of custom checkers lies in their ability to enforce naming conventions and code constructions across organizations.

The standard we want to enforce is the use of a compound statement block rather than single statements as the body of a for loop. An AST checker is the way to go because detection depends solely on the syntax of the code itself and not runtime behavior.

See the example below:

Incorrect: Correct:
for( i – 0; i < 10; i++ )
doSomething( );
for( i – 0; i < 10; i++ ) {
doSomething();
}

To flag this violation, we need to instruct the checker to find all instances of for loop nodes that contain a Statement node as an immediate descendant.

A tool that shows you a visual representation of the AST for the test case is quite helpful in the checker creation process. Here at Klocwork, we use Checker Studio to:

  • browse the AST structure of test cases,
  • identify nodes of interest, and
  • test XPath-like expressions that identify node types, qualifiers, conditions and variables to traverse the AST and flag the defect.

Note: If we wanted to enforce the compound statement rule in all loops, then we’d need to have one pattern (created using the XPath-like expression) for each possible kind, such as while loops and do while loops.

Armed with the test case, Checker Studio, and a syntax guide, Steve identified the following expression that flags the infraction:

// ForStmt [not (Stmt::CompoundStmt)]

Here’s how the test case and expression appear in Checker Studio:

Golden rules

Based on his experience, Steve has a number of golden rules that get you from idea to defect detection faster:

  • Start simple: Use a simple test case that contains the defect you want to detect and work with one simple pattern at a time. Add more complexity as you go along
  • Start rough and refine later: Don’t worry about false positives at first. In some cases it may even be easier to search for  instances that are OK and then negate the rule at the end
  • Divide and conquer: With a more complex checker, work separately on each aspect of the defect you want to detect and then bring it all together at the end for testing in Checker Studio
  • Watch your levels: Make the highlighting as relevant as possible for the issue you’re trying to find. For example, “// ClassType [MemberDecls[*]::MemberDecl]” will highlight classes that match, whereas “// ClassType/MemberDecls[*]::MemberDecl”  will highlight class members that match. The rule is the same, but the focus is different
  • Weed out false negatives: Add negative examples (good code) to check for false negatives

For more information about our custom AST checkers, watch our Checker Studio video.


It’s time to create a custom checker, but what kind?

Posted by Patti Murphy   November 15th, 2011

You’ve been using source code analysis on your integration build or your desktop, or (ideally) both. And then there’s “a situation”.

The situation

Either you:

  • Noticed a false negative you want detected, or
  • Need a way to enforce a corporate coding standard, such as the requirement for the use of  a compound statement block rather than single statements as the body of a loop.

Now what?

Time to create a custom checker, that’s what. But what kind of checker?

Source code analysis involves two families of checkers, those that involve:

  • Abstract Syntax Tree (AST) validation, and
  • Code path analysis.

An AST provides a tree-based structural representation of the source code. An AST checker allows you to pinpoint problematic syntax using XPath or XPath-derived grammar to define the problem you’re looking for. AST checkers (our version is called Klockwork AST checkers, or KAST for short) don’t require program execution to run; they detect defects right away on source code.

Code path analysis, on the other hand, targets defects related to value tracking at program execution time. Instead of style violations, you’d use a path checker to answer questions such as:

  • Is this newly-created object released before all aliases to it are removed from scope?
  • Is this data object ever range-checked before being passed to an OS function?
  • Is this string checked for special characters before being submitted as an SQL query?

To create a path checker, you don’t need to know how data is tracked by the checker. What you do need to know are the function types and values you want to track for the analysis starting point and the analysis end point where the defect (or event) is recognized and reported.

Which checker when?

Create an AST checker when the problem you want to detect:

  • is a local defect
  • does not involve program execution
  • has to do with the way the program was written
  • does not involve tracking a value
  • does not involve a path

Create a path checker when the problem you want to detect:

  • involves tracking a value
  • has a starting point (where the analysis starts) and end point (where the defect is detected)
  • involves program execution

Stay tuned for the next post in this series on best practices for AST checker creation.

For more information, see Writing custom checkers with Klocwork Extensibility or check out our member discussions in the C/C++ custom checkers forum.

–With files from CTO Gwyn Fisher


Compiler configuration

Posted by Alen Zukich   October 25th, 2011

Compiler configuration is a problem with static analysis tools.  In the past, a static analysis (or source code analysis) tool simply worked by pointing it at the source code and hitting “go”.  Now it is very different.  Without a complete understanding of the software build, including the compiler specifics, you will get inaccurate results.

Under the covers, do you really know what is happening with your compiler?  Not usually.  You make changes to your code, call your compiler or build command to compile your code, and then fix the issues.  Rinse and repeat.

But what is really important for static analysis tools, is that the compiler contains some crucial information to successfully compile your code.  Namely, the internal compiler includes and defines.  Static analysis tools must generate this data, otherwise they won’t know where the system includes and defines are coming from for your specific compiler.  Hence, the static analysis results are about as accurate as the weather man’s weekly prediction.

Luckily most compilers have a way to capture this.  For example to find out the defines and includes from gcc:

gcc -E -dM dummy.c

gcc -E -Wp,-v dummy.c

Where dummy.c is just an empty file.  This will give you a dump of all the defines and includes, respectively.  Now, when static analysis tools build their data they have a mapping of the proper defines and includes for your specific compiler and everyone is happy.

In the past, it seemed like a good idea to make compiler configuration extensible.  This meant that static analysis tools could support any compiler if you didn’t mind taking the time to build that support.  It wasn’t usually very complex but it could be prone to errors.  Instead, it makes more sense to just provide the support right out of the box, so taking the words from the late Steve Jobs: “it just works“.  As long as static analysis tools have an extensible interface, these tools should be able to support new and obscure compilers very quickly.  Make sure your static analysis vendor has support for your specific compiler that you use, and if they don’t they better turn that around in a snap.


Klocwork University consolidates learning resources into a single roster

Posted by Patti Murphy   September 7th, 2011

Klocwork Developer Network presents Klocwork University, which consolidates all our online learning resources onto a single page.

Klocwork University is your one stop for self-paced online learning and how-tos about:

  • Setting up and using our static analysis tools on your desktop or integration build
  • The latest trends in software security
  • Agile coding practices and how they intersect with static analysis
  • Klocwork product overviews

At Klocwork University you’ll see helpful descriptions of:

  • In-house and partner-generated e-learning courses
  • Video how-tos
  • Webinars

After you browse our offerings on the Klocwork University page, click your selection and access your resource. If you’re not already logged in to the Klocwork Developer Network, you’ll be prompted to log in or register to use these free resources.

This change pulls the course content descriptions from behind the login wall to provide a searchable list for members and non-members alike.

At Klocwork University, you get the information up front and you can schedule your pub breaks when and where you want. Join today. There’s no free beer though.


To report, or not to report…

Posted by Gwyn Fisher   June 6th, 2011

BalanceCreating a source code analysis (SCA) engine is a balancing act, a decision process of where you believe the most value can be found along the spectrum that is the signal-to-noise ratio of the detection process. At one end lies the realm of massive noise and hopefully complete coverage, whilst at the other is the quiet calm of the theoretically useful but ultimately useless realm of no noise, but ultimately no signal either.

That may sound counter-intuitive. Shouldn’t a zero noise point on the spectrum be accompanied by an infinitely strong signal? Perhaps in the world of DSP this is true, but in the world of SCA reducing noise comes right along with a reduction in detection capability – it’s unfortunately almost a straight-line correlation.

So if we assume that we’re trying to balance a couple of dials on our theoretical tuner, we might start by reducing or dampening noise – it’s the most obvious place to start, after all. Nobody likes to listen to their favorite FM station through the curtain of hissing and popping that accompanies the act of driving through a major city.  Likewise no developer likes sifting through a long list of bogus detection errors in order to find the hidden gems. But to drag out the analogy, assume that the only way of reducing hiss on your FM signal is to turn down the volume… now you’ve got less hiss, but also less Bruce Springsteen goodness to accompany it.

Balance is what we need here, obviously. Enough Boss to make us ignore the hiss, or to put it in a more SCA-like context, enough interesting bugs to make us ignore the incorrect, or the irrelevant (correct detections on the part of the engine that the developer just doesn’t care about, e.g. low memory conditions in a memory-insensitive environment).

Consider the following simple example that clearly lies “on the line”:

    void foo(char* s, int a)
    {
        char* s1 = s;
        if( a > 0 )
            *s1 = 'a';   // potentially use an uninitialized ‘s1’
    }

    void bar(int m)
    {
        char *s;
        foo(s, m);       // s is not initialized prior to calling ‘foo’
    }

So… to report, or not to report?

Lacking any other information, it is obvious that function ‘foo’ interacts under certain situations (when parameter ‘a’ is positive) with parameter ‘s’ (aliased as local variable ‘s1’). As we have no knowledge about the provenance of parameter ‘s’ when analyzing ‘foo’, however, there’s nothing here to cause a report and so we squirrel away the knowledge of what ‘foo’ does for later use.

When analyzing ‘bar’ we know what ‘foo’ does, and we know we’ve got an uninitialized local pointer, ‘s’. But again we’re lacking enough knowledge to know the valid values, or ranges, that parameter ‘m’ may take. There are definitely a set of circumstances here in which we know a problem will occur (if parameter ‘m’ is positive), and a set of circumstances in which we know a problem will not occur (if parameter ‘m’ is zero or negative) – this much is encoded in the functional behavior of ‘foo’. But is it a defect, or should we filter out the report in favor of providing only those situations in which we can be “sure” the bug not only exists, but can be proven to be exercised?

There’s the art of balance in a nut-shell, and it revolves around the phrase “lacking any other information.” In the ideal world, lacking any restrictions in terms of time, memory or computing power (or indeed actual from-the-wall power, as we have to worry about now), we might defer all such decisions until we categorically know that a particular data value is passed down the call graph far enough to get to ‘foo’. But in the real world of multi-million LOC projects, that approach simply can’t scale.

And so, calling on balance as our friend, we can bias a localized decision to report or not, given that we know to at least one order of approximation that bad things could happen here. Different engines pronounce that bias differently, leading to one of the greatest divides between prevalent solutions.

Now ask yourself, as the developer, is it a worthy report if you know that 10 levels up the call graph there’s a check on what eventually becomes parameter ‘m’ to ensure that it’s never positive? Perhaps you’d automatically classify this as a false positive and, annoyed at the tool, move onto the next report. Or perhaps, seeing the size of the gap in the call graph, you might just choose to code defensively, initializing ‘s’ to NULL in ‘bar’ and adding guard code to ‘foo’ because, hey, you never know.

And as we’ve all seen so many times over the years, “you never know” might just as well be written “and so it came to pass…”


Top 10 List: Well Traveled Path to Source Code Analysis Success

Posted by Brendan Harrison   May 31st, 2011

The Code Integrity folks have developed a lot of best practices on deploying static analysis and have compiled many of them in a solid whitepaper. They include a Top 10 list of what they call “The Well Traveled Path to Success”. Below is their (somewhat paraphrased in spots) list.

Static Analysis Top 101. Determine who cares. Who has a vested interest that bugs actually get fixed. How much do they care?

2. Get an expert to tune the solution for your codebase. Static analysis tuning will maximize defect finding while minimizing false positives.
3. If possible, pilot with a small group to gain early successes.
4. Appoint the proper roles, particularly management sponsor, administrator, defect triagers, fixers and verifiers.
5. Set up the proper process, incentives and consequences. Integrate the SCA tool into your environment. Automate and simplify as much as possible.
6. Get a team to handpick good, high-priority defects for the team rather than have them sift through potential false positives.
7. Set up a central resource website that includes simplified documentation, policies, procedures, roles, reports, etc.
8. Set up various reports like the daily dashboard, top ten list and the “wall of shame”. Make it public. Do a little bit of marketing.
9. Train and mentor the team providing guidance, support and discipline. Either in-person or static analysis e-learning courses work.
10. Determine success criteria and measure it. Provide status updates often, work on a source code analysis ROI model that works for your organization.

I agree with the general thrust of most of these, but some might be overkill depending on the size of your deployment. My other quibble is that many of the recommendations presume a centralized defect triage model where you’d have a central group of code reviewers sifting though bug reports.

That’s a common deployment model, but we’re seeing more people choose to just provide the tool to their developers via desktop static analysis. With the possible exception of your backlog, this will eliminate (or greatly reduce) the need for a central code review team that stares at bugs all day long. Regardless, they’re all good considerations to at least, well… consider.

With the launch of the Klocwork Developer Network, we’re making a deliberate and concerted effort to make many of these kinds of deployment resources freely available to our customers. I’ve included links where appropriate.


The Evolution of Source Code Analysis – Part 2: The Early 21st Century

Posted by Todd Landry   May 26th, 2011

In my last post, I took us back in time to an era of bad fashion, questionable music, legendary television shows, and source code analysis tools that were made specifically for software developers. It was the 1970s. In this post, I fast forward to just after the turn of the century to discuss the next evolution of static analysis tools.

The Early 21st Century

Not long after we first viewed hairy-footed Hobbits on the silver screen, and the sham that was affectionately known as Y2K, a new generation of source code analysis tools emerged to cure the errors of the first-generation tools.

These new tools looked beyond the syntactical analysis of previous tools, and instead provided inter-procedural and data-flow analysis. Low hanging fruit was definitely not the target for these tools.

These new techniques were serious–finding complex defects that could impact code quality and security, and they did that while ensuring that the “noise” (i.e. false positive rate) was greatly reduced compared to the first-generation tools. In addition to local defects, they were now identifying resource management issues, security vulnerabilities, concurrency issues, and so on. These were serious defects that,  if left undetected and unfixed, had the potential for massive problems to the code stream.

In order to perform this much deeper analysis, a fundamental change in the analysis techniques had to occur. These engines needed an unfiltered view of the entire code stream, and so they became tightly integrated with the integration build process.

Umm, Houston, we have a problem. If the analysis takes place at integration build time, then that means the analysis is no longer being initiated by the developers. Source code analysis tools became centralized and moved into a more downstream process such as part of a code audit function.

Developers were now being told they created bugs well after they actually checked in the code. They had already moved onto something entirely different, so now bringing them these day-old, or week-old defects was certainly not the most productive use of their time. It is well documented that the earlier you find defects in your code, the more cost effective it is to fix them, so you can clearly see the problems with these second-generation tools.

If only there was a way to bring these second-generation analysis capabilities to the developer desktop. More about that in my next entry.


The Evolution of Static Code Analysis – Part 1: The Early Years

Posted by Todd Landry   May 17th, 2011

Our marketing people here at Klocwork like to see me racking up frequent flyer miles and expending CO2 at roadshows, conferences and tradeshows. Whenever I’m out speaking, I always like to gauge audience familiarity with Static Code Analysis.

I’m happy to say that SCA knowledge has definitely increased over the years, but it is still not up to levels enjoyed by unit testing or integration testing.

What I plan to do over the next three weeks is to provide you with a history lesson on how Static Code Analysis has evolved over the past few decades (yes, it has been around that long!). The three different eras I will be addressing are The Early Years, The Early 21st Century, and  The Present Day.

The Early Years

As I mentioned earlier, Static Code Analysis has actually been around since the time of bell bottoms, disco music, and Space Invaders (check out the Space Invaders link)–yes, the good ole 1970s. Who out there has heard of Lint (and no, I’m not talking about the fluff coming from your old bell bottoms pockets)?

Lint was one of these first-generation SCA tools introduced in the late 70s. These tools targeted low hanging fruit in C code, such as missing or extra semi-colons, missing curlicues, potentially dangerous implicit casts, and so on.

These tools were closely integrated with the compile and link process, and so this seemed like the best time to show its errors and warnings, while the developer was actually “in process” and fixing problems found by the compiler. Since these tools delivered its warnings at compile time, it quickly became a tool that was adopted and owned by the developers themselves.

Life was good. Well, until the bugs that were being found were deemed to be relatively trivial or completely erroneous (the dreaded false positive). The problem was that these tools were only able to see one file at a time, but for accurate static analysis there is a strong need to know everything that’s going on within the entire stream.

Without that vision of the entire stream, no matter how sophisticated the analysis tools are, they will make incorrect assumptions most of the time. Because of these inaccuracies, these first-generation tools never gained the widespread acceptance of software developers.

Next up will be a look at static analysis tools during a time when “Whassssuuuupp” became a household term.


Building a Software Security Threat Model

Posted by Brendan Harrison   April 20th, 2011

We’ve talked at length before regarding software security assurance and the role static analysis can play in ensuring code is written securely. We’ve got a bunch of great resources for anyone looking to dive into this particular aspect of software security:
Lock

To build on this, next month our CTO Gwyn Fisher and the CTO of Security Innovation, Jason Taylor will be hosting a talk that expands the discussion beyond secure coding strategies alone. Jason will be talking at length on how to build a threat model for software, in particular embedded software. Gwyn will then walk through how customers should be building their software with this threat model in mind – everything from code reviews to static analysis and testing strategies. I urge you to register for the webinar and check it out – there will be lots of good information being discussed.