0 post
« Previous 1 / 2 / 3 / 4 Next »

Posts Tagged ‘source code analysis’


Golden rules of AST checker development

Posted by Patti Murphy   January 24th, 2012

In my previous post, It’s time to create a custom checker…, we looked at the considerations involved in deciding which checker to create–AST or path?

In this post, we’re going to use a custom checker to enforce an internal coding standard that extends the default set of checkers in our source code analysis tool.

To do this, I’ve called upon Steve Howard, our head of Partner Support in Europe, to get us started with an AST checker to accomplish our goal.

Steve has coached many customers through the checker creation process. In his experience, the appeal of custom checkers lies in their ability to enforce naming conventions and code constructions across organizations.

The standard we want to enforce is the use of a compound statement block rather than single statements as the body of a for loop. An AST checker is the way to go because detection depends solely on the syntax of the code itself and not runtime behavior.

See the example below:

Incorrect: Correct:
for( i – 0; i < 10; i++ )
doSomething( );
for( i – 0; i < 10; i++ ) {
doSomething();
}

To flag this violation, we need to instruct the checker to find all instances of for loop nodes that contain a Statement node as an immediate descendant.

A tool that shows you a visual representation of the AST for the test case is quite helpful in the checker creation process. Here at Klocwork, we use Checker Studio to:

  • browse the AST structure of test cases,
  • identify nodes of interest, and
  • test XPath-like expressions that identify node types, qualifiers, conditions and variables to traverse the AST and flag the defect.

Note: If we wanted to enforce the compound statement rule in all loops, then we’d need to have one pattern (created using the XPath-like expression) for each possible kind, such as while loops and do while loops.

Armed with the test case, Checker Studio, and a syntax guide, Steve identified the following expression that flags the infraction:

// ForStmt [not (Stmt::CompoundStmt)]

Here’s how the test case and expression appear in Checker Studio:

Golden rules

Based on his experience, Steve has a number of golden rules that get you from idea to defect detection faster:

  • Start simple: Use a simple test case that contains the defect you want to detect and work with one simple pattern at a time. Add more complexity as you go along
  • Start rough and refine later: Don’t worry about false positives at first. In some cases it may even be easier to search for  instances that are OK and then negate the rule at the end
  • Divide and conquer: With a more complex checker, work separately on each aspect of the defect you want to detect and then bring it all together at the end for testing in Checker Studio
  • Watch your levels: Make the highlighting as relevant as possible for the issue you’re trying to find. For example, “// ClassType [MemberDecls[*]::MemberDecl]” will highlight classes that match, whereas “// ClassType/MemberDecls[*]::MemberDecl”  will highlight class members that match. The rule is the same, but the focus is different
  • Weed out false negatives: Add negative examples (good code) to check for false negatives

For more information about our custom AST checkers, watch our Checker Studio video.


Microsoft banned function list

Posted by Alen Zukich   September 27th, 2011

We have blogged before about software security guidelines, but there is one we haven’t discussed.  Several years ago Microsoft published the “Security Development Lifecycle (SDL) Banned Function Calls” list.  These banned functions can be a good way to remove a significant number of potential code vulnerabilities from C and C++ code.  They provide recommendations on better or safer functions to use with the caveat that even these “safer” function should be used with care.

You can use the banned.h file to identify and obtain deprecation warnings or, even better, use this as part of your source code analysis.  Leveraging these warning as part of your source code analysis solution means you have better ways to filter and manage the solution as opposed to a dump of potentially thousands of warnings.  Add that into your code review tool and you have some good discussion points for your peer code reviews.

Like any security guideline, the question becomes how useful are these?  There is no question that these banned functions are debatable.  The complaint that I hear the most is that “n” functions can be used safely so they should not be part of the list.  But you can still get yourself in a whole heap of trouble with these functions as well.  Take this example from Micheal Howard’s blog:  Buffer Overflow in Apache 1.3.xx fixed on Bugtraq – the evils of strncpy and strncat!.

I believe there is merit in identifying these functions so you can ask yourself if you’re using them securely.  For more information and training on the Microsoft SDL you can look at the course “Intro to the Microsoft Security Development Lifecycle” on our web page.

Is anyone out there using the Microsoft banned function list religiously?


The Evolution of Static Code Analysis – Part 3: The Present Day

Posted by Todd Landry   June 8th, 2011

My first 2 posts looked at 2 different eras of Static Code Analysis, the Early Years and the Early 21st Century. The SCA solutions of these times were revolutionary, and helped software development teams a great deal. But they had their warts.

In the final post in this series, I’m going to introduce you to the present day Static Code Analysis technology and how it is impacting developers.

The Present Day

I’m a huge fan of Reece’s Peanut Butter Cups. I love them. I keep active so I don’t feel guilty eating them. In a strange, convoluted way, the 3rd generation of static code analysis tools are like this delicious combination of chocolate and peanut butter. Let me explain.

I’m sure you remember from my previous posts how the 1st generation tools (i.e. Lint) gave questionable results but was still considered by developers as a tool exclusively for them, and the 2nd generation tools gave really good results but moved away from being a developer tool.
The 3rd generation tools recognized that the developer must be an integral part of the process of identifying, fixing and preventing bugs from reaching the code stream and so, they took the proven results from the 2nd gen tools and delivered them right to the developer’s desktop.

Eureka! Now developers are able to perform an analysis locally, using their development environment of choice, while still getting the high accuracy and consistency that was previously only possible by checking in their code and waiting for the integration build to take place.

Think about the ramifications of this:

  • cleaner code is being checked in
  • the ‘rinse-repeat’ vicious cycle of rework is drastically reduced
  • quality teams are now able to focus on testing the product’s functionality rather than spending cycles uncovering something that could easily and quickly be found by automated tools.

Mmmm-mmmm good. Sounds like a win-win-win to me!

I think the best thing about these 3rd generation tools is simply the fact that developers are now able to resume ownership of the quality and security of the code they are producing.

Well, I hope you enjoyed this walk down memory lane. I sure did. Now I’m looking for spare change because I see a trip to the vending machine in my immediate future.

If you want to know more about the 3rd Generation tools, feel free to drop me a line.


The Evolution of Source Code Analysis – Part 2: The Early 21st Century

Posted by Todd Landry   May 26th, 2011

In my last post, I took us back in time to an era of bad fashion, questionable music, legendary television shows, and source code analysis tools that were made specifically for software developers. It was the 1970s. In this post, I fast forward to just after the turn of the century to discuss the next evolution of static analysis tools.

The Early 21st Century

Not long after we first viewed hairy-footed Hobbits on the silver screen, and the sham that was affectionately known as Y2K, a new generation of source code analysis tools emerged to cure the errors of the first-generation tools.

These new tools looked beyond the syntactical analysis of previous tools, and instead provided inter-procedural and data-flow analysis. Low hanging fruit was definitely not the target for these tools.

These new techniques were serious–finding complex defects that could impact code quality and security, and they did that while ensuring that the “noise” (i.e. false positive rate) was greatly reduced compared to the first-generation tools. In addition to local defects, they were now identifying resource management issues, security vulnerabilities, concurrency issues, and so on. These were serious defects that,  if left undetected and unfixed, had the potential for massive problems to the code stream.

In order to perform this much deeper analysis, a fundamental change in the analysis techniques had to occur. These engines needed an unfiltered view of the entire code stream, and so they became tightly integrated with the integration build process.

Umm, Houston, we have a problem. If the analysis takes place at integration build time, then that means the analysis is no longer being initiated by the developers. Source code analysis tools became centralized and moved into a more downstream process such as part of a code audit function.

Developers were now being told they created bugs well after they actually checked in the code. They had already moved onto something entirely different, so now bringing them these day-old, or week-old defects was certainly not the most productive use of their time. It is well documented that the earlier you find defects in your code, the more cost effective it is to fix them, so you can clearly see the problems with these second-generation tools.

If only there was a way to bring these second-generation analysis capabilities to the developer desktop. More about that in my next entry.


The Evolution of Static Code Analysis – Part 1: The Early Years

Posted by Todd Landry   May 17th, 2011

Our marketing people here at Klocwork like to see me racking up frequent flyer miles and expending CO2 at roadshows, conferences and tradeshows. Whenever I’m out speaking, I always like to gauge audience familiarity with Static Code Analysis.

I’m happy to say that SCA knowledge has definitely increased over the years, but it is still not up to levels enjoyed by unit testing or integration testing.

What I plan to do over the next three weeks is to provide you with a history lesson on how Static Code Analysis has evolved over the past few decades (yes, it has been around that long!). The three different eras I will be addressing are The Early Years, The Early 21st Century, and  The Present Day.

The Early Years

As I mentioned earlier, Static Code Analysis has actually been around since the time of bell bottoms, disco music, and Space Invaders (check out the Space Invaders link)–yes, the good ole 1970s. Who out there has heard of Lint (and no, I’m not talking about the fluff coming from your old bell bottoms pockets)?

Lint was one of these first-generation SCA tools introduced in the late 70s. These tools targeted low hanging fruit in C code, such as missing or extra semi-colons, missing curlicues, potentially dangerous implicit casts, and so on.

These tools were closely integrated with the compile and link process, and so this seemed like the best time to show its errors and warnings, while the developer was actually “in process” and fixing problems found by the compiler. Since these tools delivered its warnings at compile time, it quickly became a tool that was adopted and owned by the developers themselves.

Life was good. Well, until the bugs that were being found were deemed to be relatively trivial or completely erroneous (the dreaded false positive). The problem was that these tools were only able to see one file at a time, but for accurate static analysis there is a strong need to know everything that’s going on within the entire stream.

Without that vision of the entire stream, no matter how sophisticated the analysis tools are, they will make incorrect assumptions most of the time. Because of these inaccuracies, these first-generation tools never gained the widespread acceptance of software developers.

Next up will be a look at static analysis tools during a time when “Whassssuuuupp” became a household term.


Will source code analysis change developer culture?

Posted by Alen Zukich   April 26th, 2011

Will source code analysis (SCA) or static analysis change developer culture? The answer really depends on the developer’s skill set. In my experience, there are developers who are excellent at what they do (visionaries), and then there are some that just don’t get it (fence posts).

I’m not here to talk about the visionaries — they already get it. They know that SCA techniques help find critical issues early in the development cycle. Sometimes SCA finds great stuff, sometimes it doesn’t. But it’s always worth the time, because it makes developers better at what they do. In fact, it’s the visionaries who demand SCA from the outset.

Nor am I here to talk about the fence posts; they won’t last long at what they’re doing anyway.

I’m mostly concerned with the majority, the ones that fall in between. Do they find value in SCA? Will it change the way they develop code?  These developers are quite valued and extremely important to the organization — so much so that they have a full workload and tons of commitments. It’s because of this workload that you typically won’t see a shift in their work culture. When they’re asked to run SCA as part of their development process, they’ll probably accept this and give their honest opinion. They will wait to see whether it finds the so-called “silver bullet” to decide whether they are getting value out of it. In other words, if SCA finds a super-juicy bug that would have been catastrophic, it’s a winner. If not, then you’re probably not going to convince them that SCA is right for them. On top of that, they have a pretty big workload, and SCA is throwing false positives, making them spend more time.

The reality is that with SCA you may never find the silver bullet. The reality is that SCA tools throw false positives. The reality is that it will take some time up front to be proficient with the tools. Yes, these realities suck. But don’t lose focus on the prize. SCA will always pay off in the end. The value of SCA is quite clear these days (see here, here and here).

Okay, so there’s some value despite the realities, but there’s also a hidden value: training. Even though SCA may show you a defect you don’t care about, it gives you food for thought. Coding best practices are a good example. A memory-constrained shop will tell you they always need to check for NULL with memory allocation:


void *blah = malloc(10);
if (blah != NULL){
   /* Do something here with blah */
}else{
/* Do something here if it fails */
}


Surprisingly, others will tell you that the likelihood of running out of memory is very low. So why place the unnecessary checks? This and many other arguments will go on forever. But these type of questions make you think twice: should I be checking for NULL, or should I rewrite my code? So in the end, SCA gets you thinking.



Memory overflows

Posted by Alen Zukich   April 12th, 2011

A few years back a customer said they had all kinds of trouble with bugs corrupting their stack.  Even though they asked if source code analysis tools could help find stack corruption, once we got an example, we found that they were really looking for was memory overflows.  So what on earth is a memory overflow?  Does that even exist?

Yes, except it is probably not what you’re thinking, it’s not the same as a memory leak;  a memory overflow is quite different.  A memory overflow is really just a form of a buffer overflow.  The impact of memory overflow is unexpected behavior or program failure.   Take this example:

#include <stdio.h>
typedef struct s1_ {
   int i;
   int j;
   char arr[10];
}s1;

typedef struct s2_ {
   char b[20];
   char c[40];
}s2;

main()
{
   s1 block1;
   memset(&block1, 0, sizeof(s2));
   block1.i =1;
}

Here we have the case of incorrectly using ‘memset’ at line 16 where ‘sizeof(s2)’ is bigger than ‘block1′.  In fact, going back to this customer revealed that the issue was due to memset clearing much more area than intended.  If you’re using static analysis or source code analysis tools then you are probably covered by this.  You will find this type of issue usually in the “buffer overflow” category.

So, are you free of your memory overflows?


Dealing with a different type of backlog…your bug backlog

Posted by Todd Landry   February 3rd, 2011

As a product manager, the only backlog I typically care about is my product backlog. Do I have the right stories in there? Do the stories have enough detail? Are they properly prioritized? You know, that kind of stuff. Today, however, I’m going to write about a very different backlog, that is the static analysis defect backlog.

A static analysis backlog is created when you run a static analysis product on your code base for the very first time. Chances are pretty good that the first analysis is going to list a large number of defects, some that are without question real, and some that perhaps are not. Do not freak out! This is the first time that analysis engine has ‘laid eyes’ upon your code and it is going to flex its muscles and show you any weaknesses it believes exist. So how does one deal with this? Here are a few strategies to help you:

1) Don’t boil the ocean. Before you even run that first analysis, don’t have a “wouldn’t it be cool” moment, where you decide to turn on every single rule the analysis engine has. There is a reason why static analysis tools haven’t turned on everything.  They are showing the most accurate and critical issues first.  So unless you have unlimited time and resources, your best bet is to start with a core set of rules and run the analysis based on that set. This core set of rules should include things such as memory/resource leaks, buffer overruns, null pointer dereferences, uninitialized variables, and so on. Add other rules once you have this core set under control.

Is your issue backlog making you cross eyed? Try these coping strategies.


2) Baseline your defects. Consider that first analysis your baseline and choose to ‘park’ them for the time being. Chances are the product that the analysis was run on is one that has already been released to the public, and in good working order. Zero out these defects for now, and start to triage them, which leads into strategy #3.

3) This is going to sound pretty obvious, but when it comes to managing your issue backlog start looking at the most critical issues first. These are the ones that are most likely to cause a failure of some sort, so determine if these issues are real, and if so, fix them immediately. Once you’re done with the most critical issues, move to the next level of severity, and continue on that way.

4) Finally, tune your analysis. Any good vendor will allow you to tune your analysis. The benefits of tuning are twofold; 1) you can find code issues that would otherwise go undetected and, 2) reduce the number of issues that the engine reports incorrectly in the context of your source code. You should think of ways to give the tool more context about your code base to increase accuracy.

If you follow these suggestions, you’ll definitely have a better grasp of your bug backlog, and you’ll be able to execute on reducing that backlog quickly and efficiently. If you don’t, then at some point, you may feel a little like the critter pictured here.

If there are any other strategies you’ve tried to deal with your bug backlog, leave a comment or two. I’d love to hear about them.


Software Development for Medical Devices Seminar

Posted by Todd Landry   December 17th, 2010

Yesterday we kicked off our first Medical Devices specific seminar in Boston, with our friends from SterlingTech and Vector Software. The day was all about software development for medical devices and more specifically, about managing software risk to help ensure you are compliant with all of the FDA regulations specific to software code. We had a great turnout, with over 75% of registrants showing up for this session. A couple of observations I found interesting from this event:

  • The medical devices community seems to be fairly tight-knit. Everyone at the event seemed to know everyone else, and there was definitely a strong support system in place among the companies in attendance.
  • And secondly, there appears to be fairly high adoption of Agile practices, particularly Scrum (albeit somewhat modified), with these Medical device companies, and the healthcare industry in general for that matter. From everything I heard, this adoption curve shows no signs of slowing down.

If IEC 62304 is important to your organization, you might want to check out this half-day seminar. The next one will be in Minneapolis in January. I’ll be packing my woolies for that one!


0010 0000 or 0000 0010 which one are you?

Posted by Eric Hollebone   August 10th, 2010

I love this quote by Carl Ek from  Code Integrity solutions:

There are 0010 0000 kinds of people in the world: Those that understand the difference between Big Endian and Little Endian, and those that do not.

Issues with Endianism and processor architecture ports are becoming more and more common these days as more desktop source code moves into different arenas.  Gone are the days when the 32-bit memory model or little-endian format dominate. Software changes are required to support the growth occurring not at the desktop, but in the server  and mobile platforms.

Mobile devices especially have opened a Pandora’s box of Endian and memory problems, with variety of processor architectures with ARM[1] leading the way.  Add to this mix,  end-consumers are demanding desktop features like Adobe Flash or Office apps on mobile devices, many a stable codebase will fall apart when ported to either mobile or server.

For developers porting to different platforms, there are some significant challenges.  Just to list a few:

  • CPU optimizations need to be reviewed
  • inline assembly calls require rewriting or removal
  • machine word (WORD) allocations may require refactoring
  • any binary data exchanged over the network stacks require verification

None of these are  new, they’re just not a common skillset for most developers.

Source code analysis can be a boon in two ways. Firstly, in the planning phase by helping you determine the breadth of the effort,  and secondly by identifying any existing  issues, particularly of the memory allocation and Endian varieties.

For more in depth information, there are two recent articles available from Dr. Dobbs:

[1] Note: Some ARM processors support both big and little Endian formats.