0 post
« Previous 1 / 2 / 3 / 4 / 5 Next »

Posts Tagged ‘Static Analysis’


Golden rules of AST checker development

Posted by Patti Murphy   January 24th, 2012

In my previous post, It’s time to create a custom checker…, we looked at the considerations involved in deciding which checker to create–AST or path?

In this post, we’re going to use a custom checker to enforce an internal coding standard that extends the default set of checkers in our source code analysis tool.

To do this, I’ve called upon Steve Howard, our head of Partner Support in Europe, to get us started with an AST checker to accomplish our goal.

Steve has coached many customers through the checker creation process. In his experience, the appeal of custom checkers lies in their ability to enforce naming conventions and code constructions across organizations.

The standard we want to enforce is the use of a compound statement block rather than single statements as the body of a for loop. An AST checker is the way to go because detection depends solely on the syntax of the code itself and not runtime behavior.

See the example below:

Incorrect: Correct:
for( i – 0; i < 10; i++ )
doSomething( );
for( i – 0; i < 10; i++ ) {
doSomething();
}

To flag this violation, we need to instruct the checker to find all instances of for loop nodes that contain a Statement node as an immediate descendant.

A tool that shows you a visual representation of the AST for the test case is quite helpful in the checker creation process. Here at Klocwork, we use Checker Studio to:

  • browse the AST structure of test cases,
  • identify nodes of interest, and
  • test XPath-like expressions that identify node types, qualifiers, conditions and variables to traverse the AST and flag the defect.

Note: If we wanted to enforce the compound statement rule in all loops, then we’d need to have one pattern (created using the XPath-like expression) for each possible kind, such as while loops and do while loops.

Armed with the test case, Checker Studio, and a syntax guide, Steve identified the following expression that flags the infraction:

// ForStmt [not (Stmt::CompoundStmt)]

Here’s how the test case and expression appear in Checker Studio:

Golden rules

Based on his experience, Steve has a number of golden rules that get you from idea to defect detection faster:

  • Start simple: Use a simple test case that contains the defect you want to detect and work with one simple pattern at a time. Add more complexity as you go along
  • Start rough and refine later: Don’t worry about false positives at first. In some cases it may even be easier to search for  instances that are OK and then negate the rule at the end
  • Divide and conquer: With a more complex checker, work separately on each aspect of the defect you want to detect and then bring it all together at the end for testing in Checker Studio
  • Watch your levels: Make the highlighting as relevant as possible for the issue you’re trying to find. For example, “// ClassType [MemberDecls[*]::MemberDecl]” will highlight classes that match, whereas “// ClassType/MemberDecls[*]::MemberDecl”  will highlight class members that match. The rule is the same, but the focus is different
  • Weed out false negatives: Add negative examples (good code) to check for false negatives

For more information about our custom AST checkers, watch our Checker Studio video.


It’s time to create a custom checker, but what kind?

Posted by Patti Murphy   November 15th, 2011

You’ve been using source code analysis on your integration build or your desktop, or (ideally) both. And then there’s “a situation”.

The situation

Either you:

  • Noticed a false negative you want detected, or
  • Need a way to enforce a corporate coding standard, such as the requirement for the use of  a compound statement block rather than single statements as the body of a loop.

Now what?

Time to create a custom checker, that’s what. But what kind of checker?

Source code analysis involves two families of checkers, those that involve:

  • Abstract Syntax Tree (AST) validation, and
  • Code path analysis.

An AST provides a tree-based structural representation of the source code. An AST checker allows you to pinpoint problematic syntax using XPath or XPath-derived grammar to define the problem you’re looking for. AST checkers (our version is called Klockwork AST checkers, or KAST for short) don’t require program execution to run; they detect defects right away on source code.

Code path analysis, on the other hand, targets defects related to value tracking at program execution time. Instead of style violations, you’d use a path checker to answer questions such as:

  • Is this newly-created object released before all aliases to it are removed from scope?
  • Is this data object ever range-checked before being passed to an OS function?
  • Is this string checked for special characters before being submitted as an SQL query?

To create a path checker, you don’t need to know how data is tracked by the checker. What you do need to know are the function types and values you want to track for the analysis starting point and the analysis end point where the defect (or event) is recognized and reported.

Which checker when?

Create an AST checker when the problem you want to detect:

  • is a local defect
  • does not involve program execution
  • has to do with the way the program was written
  • does not involve tracking a value
  • does not involve a path

Create a path checker when the problem you want to detect:

  • involves tracking a value
  • has a starting point (where the analysis starts) and end point (where the defect is detected)
  • involves program execution

Stay tuned for the next post in this series on best practices for AST checker creation.

For more information, see Writing custom checkers with Klocwork Extensibility or check out our member discussions in the C/C++ custom checkers forum.

–With files from CTO Gwyn Fisher


Compiler configuration

Posted by Alen Zukich   October 25th, 2011

Compiler configuration is a problem with static analysis tools.  In the past, a static analysis (or source code analysis) tool simply worked by pointing it at the source code and hitting “go”.  Now it is very different.  Without a complete understanding of the software build, including the compiler specifics, you will get inaccurate results.

Under the covers, do you really know what is happening with your compiler?  Not usually.  You make changes to your code, call your compiler or build command to compile your code, and then fix the issues.  Rinse and repeat.

But what is really important for static analysis tools, is that the compiler contains some crucial information to successfully compile your code.  Namely, the internal compiler includes and defines.  Static analysis tools must generate this data, otherwise they won’t know where the system includes and defines are coming from for your specific compiler.  Hence, the static analysis results are about as accurate as the weather man’s weekly prediction.

Luckily most compilers have a way to capture this.  For example to find out the defines and includes from gcc:

gcc -E -dM dummy.c

gcc -E -Wp,-v dummy.c

Where dummy.c is just an empty file.  This will give you a dump of all the defines and includes, respectively.  Now, when static analysis tools build their data they have a mapping of the proper defines and includes for your specific compiler and everyone is happy.

In the past, it seemed like a good idea to make compiler configuration extensible.  This meant that static analysis tools could support any compiler if you didn’t mind taking the time to build that support.  It wasn’t usually very complex but it could be prone to errors.  Instead, it makes more sense to just provide the support right out of the box, so taking the words from the late Steve Jobs: “it just works“.  As long as static analysis tools have an extensible interface, these tools should be able to support new and obscure compilers very quickly.  Make sure your static analysis vendor has support for your specific compiler that you use, and if they don’t they better turn that around in a snap.


Electronic imports contain security threats

Posted by Alen Zukich   July 19th, 2011

I read an interesting post on electronic imports that could contain security threats.  I can only speak from the software perspective, but I can say that most customers I’ve dealt with usually integrate some sort of software security audit process with any 3rd-party integrator and from my experience that means adopting static analysis.  How many organizations are there that haven’t jumped on board with static analysis?  Probably more than I can count.

It would be very interesting to hear of some of the Armed Services and Intelligence cyber threats that the government has not publically disclosed.  That might be an eye opener.


The Evolution of Static Code Analysis – Part 3: The Present Day

Posted by Todd Landry   June 8th, 2011

My first 2 posts looked at 2 different eras of Static Code Analysis, the Early Years and the Early 21st Century. The SCA solutions of these times were revolutionary, and helped software development teams a great deal. But they had their warts.

In the final post in this series, I’m going to introduce you to the present day Static Code Analysis technology and how it is impacting developers.

The Present Day

I’m a huge fan of Reece’s Peanut Butter Cups. I love them. I keep active so I don’t feel guilty eating them. In a strange, convoluted way, the 3rd generation of static code analysis tools are like this delicious combination of chocolate and peanut butter. Let me explain.

I’m sure you remember from my previous posts how the 1st generation tools (i.e. Lint) gave questionable results but was still considered by developers as a tool exclusively for them, and the 2nd generation tools gave really good results but moved away from being a developer tool.
The 3rd generation tools recognized that the developer must be an integral part of the process of identifying, fixing and preventing bugs from reaching the code stream and so, they took the proven results from the 2nd gen tools and delivered them right to the developer’s desktop.

Eureka! Now developers are able to perform an analysis locally, using their development environment of choice, while still getting the high accuracy and consistency that was previously only possible by checking in their code and waiting for the integration build to take place.

Think about the ramifications of this:

  • cleaner code is being checked in
  • the ‘rinse-repeat’ vicious cycle of rework is drastically reduced
  • quality teams are now able to focus on testing the product’s functionality rather than spending cycles uncovering something that could easily and quickly be found by automated tools.

Mmmm-mmmm good. Sounds like a win-win-win to me!

I think the best thing about these 3rd generation tools is simply the fact that developers are now able to resume ownership of the quality and security of the code they are producing.

Well, I hope you enjoyed this walk down memory lane. I sure did. Now I’m looking for spare change because I see a trip to the vending machine in my immediate future.

If you want to know more about the 3rd Generation tools, feel free to drop me a line.


The Evolution of Source Code Analysis – Part 2: The Early 21st Century

Posted by Todd Landry   May 26th, 2011

In my last post, I took us back in time to an era of bad fashion, questionable music, legendary television shows, and source code analysis tools that were made specifically for software developers. It was the 1970s. In this post, I fast forward to just after the turn of the century to discuss the next evolution of static analysis tools.

The Early 21st Century

Not long after we first viewed hairy-footed Hobbits on the silver screen, and the sham that was affectionately known as Y2K, a new generation of source code analysis tools emerged to cure the errors of the first-generation tools.

These new tools looked beyond the syntactical analysis of previous tools, and instead provided inter-procedural and data-flow analysis. Low hanging fruit was definitely not the target for these tools.

These new techniques were serious–finding complex defects that could impact code quality and security, and they did that while ensuring that the “noise” (i.e. false positive rate) was greatly reduced compared to the first-generation tools. In addition to local defects, they were now identifying resource management issues, security vulnerabilities, concurrency issues, and so on. These were serious defects that,  if left undetected and unfixed, had the potential for massive problems to the code stream.

In order to perform this much deeper analysis, a fundamental change in the analysis techniques had to occur. These engines needed an unfiltered view of the entire code stream, and so they became tightly integrated with the integration build process.

Umm, Houston, we have a problem. If the analysis takes place at integration build time, then that means the analysis is no longer being initiated by the developers. Source code analysis tools became centralized and moved into a more downstream process such as part of a code audit function.

Developers were now being told they created bugs well after they actually checked in the code. They had already moved onto something entirely different, so now bringing them these day-old, or week-old defects was certainly not the most productive use of their time. It is well documented that the earlier you find defects in your code, the more cost effective it is to fix them, so you can clearly see the problems with these second-generation tools.

If only there was a way to bring these second-generation analysis capabilities to the developer desktop. More about that in my next entry.


The Evolution of Static Code Analysis – Part 1: The Early Years

Posted by Todd Landry   May 17th, 2011

Our marketing people here at Klocwork like to see me racking up frequent flyer miles and expending CO2 at roadshows, conferences and tradeshows. Whenever I’m out speaking, I always like to gauge audience familiarity with Static Code Analysis.

I’m happy to say that SCA knowledge has definitely increased over the years, but it is still not up to levels enjoyed by unit testing or integration testing.

What I plan to do over the next three weeks is to provide you with a history lesson on how Static Code Analysis has evolved over the past few decades (yes, it has been around that long!). The three different eras I will be addressing are The Early Years, The Early 21st Century, and  The Present Day.

The Early Years

As I mentioned earlier, Static Code Analysis has actually been around since the time of bell bottoms, disco music, and Space Invaders (check out the Space Invaders link)–yes, the good ole 1970s. Who out there has heard of Lint (and no, I’m not talking about the fluff coming from your old bell bottoms pockets)?

Lint was one of these first-generation SCA tools introduced in the late 70s. These tools targeted low hanging fruit in C code, such as missing or extra semi-colons, missing curlicues, potentially dangerous implicit casts, and so on.

These tools were closely integrated with the compile and link process, and so this seemed like the best time to show its errors and warnings, while the developer was actually “in process” and fixing problems found by the compiler. Since these tools delivered its warnings at compile time, it quickly became a tool that was adopted and owned by the developers themselves.

Life was good. Well, until the bugs that were being found were deemed to be relatively trivial or completely erroneous (the dreaded false positive). The problem was that these tools were only able to see one file at a time, but for accurate static analysis there is a strong need to know everything that’s going on within the entire stream.

Without that vision of the entire stream, no matter how sophisticated the analysis tools are, they will make incorrect assumptions most of the time. Because of these inaccuracies, these first-generation tools never gained the widespread acceptance of software developers.

Next up will be a look at static analysis tools during a time when “Whassssuuuupp” became a household term.


IDE vs text editor

Posted by Alen Zukich   May 10th, 2011

I’m sure this topic has been discussed a million times, but hey, here we go again.  A recent question came up on whether people liked their experience of Eclipse vs. Visual Studio.  Of course this brought up the advantages of one versus the other.  But is that really a fair comparison? It really depends.  What type of application are you building — a native Windows application?  Surely going with Visual Studio makes sense. But if the goal is cross-platform, then you might look at Eclipse.

Glad to see people are thinking about IDEs, but what really intrigues me about this conversation of one IDE versus another is that someone always has to add their two cents about the ancient text editors of the world.   Something like “real programmers use vi”.  Hold the phone.  Are we talking about the same text editor that requires you to memorize a gazillion key bindings?

I don’t get this.  I understand legacy use, as vi was the only available built-in text editor at the time and still is the only choice of hackers today.  But times have changed.  Anyone I’ve talked to who is using vi (or other known text editors like emacs) always seems very proud of it.  Maybe knowing how to use such a complex tool provides some self-worth.  I just don’t know.  Seems like it would be the same as me bragging about my portable Walkman or the 8-track player in my car.

Don’t the features of Visual Studio or Eclipse make you faster?  With a click of a button you can refactor your code.  With simple auto-completion the IDE eliminates simple typing (or even mistakes).  Plus with built-in tools for static analysis, test generation, etc., what are you waiting for?

So you vi/vim/emacs coders out there — tell me why on earth you are sticking with it. What makes you a better programmer using vi/vim or emacs?


Will source code analysis change developer culture?

Posted by Alen Zukich   April 26th, 2011

Will source code analysis (SCA) or static analysis change developer culture? The answer really depends on the developer’s skill set. In my experience, there are developers who are excellent at what they do (visionaries), and then there are some that just don’t get it (fence posts).

I’m not here to talk about the visionaries — they already get it. They know that SCA techniques help find critical issues early in the development cycle. Sometimes SCA finds great stuff, sometimes it doesn’t. But it’s always worth the time, because it makes developers better at what they do. In fact, it’s the visionaries who demand SCA from the outset.

Nor am I here to talk about the fence posts; they won’t last long at what they’re doing anyway.

I’m mostly concerned with the majority, the ones that fall in between. Do they find value in SCA? Will it change the way they develop code?  These developers are quite valued and extremely important to the organization — so much so that they have a full workload and tons of commitments. It’s because of this workload that you typically won’t see a shift in their work culture. When they’re asked to run SCA as part of their development process, they’ll probably accept this and give their honest opinion. They will wait to see whether it finds the so-called “silver bullet” to decide whether they are getting value out of it. In other words, if SCA finds a super-juicy bug that would have been catastrophic, it’s a winner. If not, then you’re probably not going to convince them that SCA is right for them. On top of that, they have a pretty big workload, and SCA is throwing false positives, making them spend more time.

The reality is that with SCA you may never find the silver bullet. The reality is that SCA tools throw false positives. The reality is that it will take some time up front to be proficient with the tools. Yes, these realities suck. But don’t lose focus on the prize. SCA will always pay off in the end. The value of SCA is quite clear these days (see here, here and here).

Okay, so there’s some value despite the realities, but there’s also a hidden value: training. Even though SCA may show you a defect you don’t care about, it gives you food for thought. Coding best practices are a good example. A memory-constrained shop will tell you they always need to check for NULL with memory allocation:


void *blah = malloc(10);
if (blah != NULL){
   /* Do something here with blah */
}else{
/* Do something here if it fails */
}


Surprisingly, others will tell you that the likelihood of running out of memory is very low. So why place the unnecessary checks? This and many other arguments will go on forever. But these type of questions make you think twice: should I be checking for NULL, or should I rewrite my code? So in the end, SCA gets you thinking.



A Rockin’ Agile Roadshow

Posted by Todd Landry   April 7th, 2011

Last week I toured the West coast with our friends from VersionOne, Perforce, and Electric Cloud on our Agile roadshow hitting the cities of Seattle, Santa Clara, and San Diego. In one of the after meeting discussions, one of the attendees asked me what the differences were between Agile and Lean. Having only been involved with Lean from an outside perspective, I didn’t really think there were huge differences and that they shared many of the same beliefs.

Luckily, it looks like others believe this to be the case too. So rather than me trying to explain this, this timely blog does a great job of explaining Agile and Lean, and why there is a lack of understanding and acceptance of Agile practices in many companies that practice Lean.

Also, as part of this Agile roadshow, we had a bit of a celebrity in our midst--our illustrious keynote speaker David Hussman of DevJam consulting has a past that most of us weekend musicians dream about. He was part of a big-hair metal band! Not only can he play a mean guitar, the dude knows his stuff about Agile and gave one of the best keynotes I’ve ever seen. Check out his website when you get a chance, and see if you can find him in this video.