37 posts
« Previous 1 / 2 / 3 / 4 Next »

Archive for the ‘Static Analysis’ Category


Compiler warnings, Coding standards, Code quality…oh my! (Part 1)

Posted by Alen Zukich   October 7th, 2009

In this 3 part blog series I want to cover general misconceptions with static analysis coverage.  This will include a discussion about:

  • compiler warnings available,
  • different types of style issues including coding standards, and
  • your available options to fit them into your formal process.

Very often customers ask why we don’t cover specific checkers.  We always get great feedback on high value checkers that they would like to see.  But occasionally we get the request to find simple compiler warnings or code style issues.

For the first part of this series I want to focus on compiler warnings.  These are not the compiler errors such as syntax/parse errors you get with a compiler.  Instead, I want to concentrate on those pesky compiler warnings that still let you build your application, when you really know you shouldn’t.  We have all experienced compiler warnings that are just plain confused with your code.  But on the whole, you are guaranteed to find some pretty big blunders.

Most modern compilers focus on providing more details about compiler warnings.  These can be very valuable as it helps find many of those plain dumb mistakes.  It varies by compilers, but many find things such as constant expressions from conditionals, returning from a void function, assignment in condition (use = instead of ==), suspiciously-placed semi-colon and many, many more.

To find some of the issues, you usually need to provide a compiler flag -Wall.  For example:

    gcc -c -Wall foo.c 

Make sure you read your compiler documentation for available warnings.  Here is the gnu gcc compiler and Microsoft cl compiler docs.

Given that every compiler on the market provides its own “checkers” for compiler issues, does it really make sense for static analysis to get in there and detect these issues again?  I strongly believe that every developer should ALWAYS clean up their compiler warnings before going onto static analysis.

But you will still find static analysis tools providing these capabilities.  Why?  Well, first and foremost, not every compiler has the capability to find simple coding issues.  The other reality is that not everyone checks the compiler warnings…(we all know who we are).  Or sometimes you just want to run one tool.  In other words get the more complex bugs with static analysis along with the compiler warnings.  It is for these reasons that static analysis tools have introduced many of these low-level issues.

For the next part of this blog series, I want to go into the details of compiler warnings and some of the things that coding standards are doing as well.


Static analysis for Ruby/Python

Posted by Denis Sidorov   June 29th, 2009

As a developer of static analysis tool for mainstream statically-typed languages, like C++ and Java, I was wondering for quite a while about how well static analysis applies to dynamically-typed languages, like Ruby and Python. And recently, I have come across this interesting project on GitHub: Reek – Code smell detector for Ruby. Well, I suppose that’s just a fancy way to name a static analysis tool.

What can Reek detect? It does not do heavyweight data/control flow analysis, so the list is not very exciting:

Interestingly, despite the poor set of features, compared to modern C++/Java static analysis solutions, Reek gets positive feedback from Ruby community. Some take it to the extreme – integrate Reek into the build and handle code smells as failing tests, and that of course breaks the build every time a new code smell is detected. So, is Ruby community starving for real static-analysis tool?..

This question forced me to run a quick research on what’s the current state of static analysis tools for dynamic languages. I was able to find the following tools for Ruby and Python:

Python tools appear to be pretty conservative about program analysis and try to “compensate” the dynamic nature of the language.

  • PyChecker and pylint – These two tools aim to provide the same kind of warnings/errors, a compiler for statically-typed language would report automatically. For example: too few/many arguments in a method call, unused variables, using return value of a method that does not actually return any value, etc. One of the checkers for pylint, in fact tries to “simulate” static type system by using type inference, and detect missing members and functions, and this, in turn, might involve some elements of data flow analysis. Beside that, there is a bunch of metrics-based rules that can be checked automatically – you can put a limit on number of methods in a class, number of variables in a function, size/complexity of a function etc.

Tools for Ruby introduce the same kind of checks, but from slightly different angle. Rather than adopting the lint metaphor (a complementary tool, finding bugs/errors that compiler does not detect), the tools are positioned to find design flaws.

  • Reek – See above.
  • Flay – Checks code bases for duplicates. Aims to enforce the DRY design principle.
  • Flog – A tool to spot the most complex functions in your code. Metric-based.
  • Roodi – Supports a bunch of simple syntax-based and metric-bases checks. For example: assignment in condition, case missing else, max method/class/module line count, method/class/module name check.

Having looked through these projects, I’ve come to find two things:

First – There are some tools that do static analysis for Ruby/Python, but they are quite simple and don’t do (or don’t try to do?) any advanced heavyweight analysis.

Second – What Ruby/Python developers want to be automatically found in their code, is quite different from what their C/C++ fellows expect from a static analysis tool. And that is because the languages and programming cultures are quite different. For example, in Ruby there is no such thing as:

  • null pointer dereference – nil is a first class object
  • array bounds violation – array would return nil when index is out of bounds
  • uninitialized variables – all variables are automatically initialized with nil
  • memory leaks – garbage collection takes care of that

But they do have errors in their programs, don’t they? Yes, but Ruby/Python developers rely on tests pretty heavily. In fact, some claim, that tests are the only right way to deal with bugs in your program. This way a tool for automatic error detection might even be considered harmful – just because it can be used as an excuse for not writing tests.

So, what kind of job is left for a static analysis tool? Well, detect design flaws, a.k.a. “code smells”. In other words – automatically find subjects for refactoring (a change that does not affect program functionality). This way static analysis tool fits naturally into Red/Green/Refactor cycle.

Another possible area where static analysis based error detection can prove useful is security vulnerabilities – it is pretty hard to spot all corner cases up-front (or through exploratory testing), just because security is a complex domain and requires good amount of knowledge and expertise.


Parallel Lint

Posted by Alen Zukich   June 22nd, 2009

Interesting article on static analysis tools to help find concurrency issues.  These so called “Parallel Lint” tools are specific to finding these types of issues.  Overall there are some great discussions on certain tools, and it is always nice when Klocwork gets mentioned.  But my problem is with the categorization of these tools.  It always makes me feel sick every time someone puts Klocwork in the same category of “powerful static analysis” with JLint, C++Test, FXCop and my favorite PC-Lint.

This article goes deeper into PC-Lint and what they are doing with deadlocks.  The author highlights a very important point here:

“Like compilers, static analyzers operate each .cpp file separately. And that’s why if f() function is called in parallel mode in file A from file B, we cannot know this when analyzing file B. Of course there are static analyzers which analyze the whole set of files at once but it is a very difficult task. At least, PC-Lint operates each file separately.”

This is a point I feel keeps getting lost with modern static analysis tools today.  Forget the Lint of the past or these other tools, their focus is on file by file analysis.  These old tools are doing simple grep type analysis.  Sometimes where you’re lucky you get a little bit of control flow with a dash of data flow analysis.  But plainly they are missing the deep inter-procedural analysis and techniques that are used with modern static analysis tools today.  I’m hoping the message is getting out there that static source code analysis is far far beyond Lint and is providing the context you never had before.


False positives in modern static analyzers

Posted by Alen Zukich   May 22nd, 2009

In response to Jason’s post about false positives.  First of all there is a general misconception of false positives.  Modern static source code analysis tools have changed the game.  It is not the Lint tool of the past, a focus with deep inter-procedural technology has placed the requirement that static tools today produce more real issues than false reports.

With that said, Jason is right, large code bases never running static analysis will produce a large number of issues no matter how accurate it is.  Even though static analysis tools do provide a number of ways to manage this (and Jason talks about one) it does make sense to put this in your code reviews. You are looking at legacy code but if you are doing code reviews then you must have changed something with that legacy code.  Therefore having those bugs visible to you during the code review could suddenly now apply.


Static analysis and code reviews

Posted by Alen Zukich   May 19th, 2009

Jason certainly hits the nail on the head.  Automation, specifically using static analysis, is key and it should be tightly integrated with your code review. Although we need to be careful where we label source code analysis.  Static source code analysis certainly can find those low level issues such as labeling your local variables correctly, but it goes beyond simple code style issues.

Where static source code analysis can really help is with the deep inter-procedural context that it can provide.  For example, during a code review you go through some code with a number of function calls.  Hopefully you know what each and every function is doing…but do you really?  This is where the deep analysis of static source code tools can help.  It can help you identify that there may be an issue in the code review and that issue happens to show that a function is returning NULL.  Uh oh, potential null pointer dereference on our hands.

Now add code reviews with other static source code technology, such as full source cross reference information, flowcharting, impact analysis for any function/methods and architectural representation to show you the full context of the system.  Now you’re talking powerful.


Findbugs not recognizing exceptions? Java static analysis

Posted by Alen Zukich   May 4th, 2009

We’ve posted previously on some of the differences between Findbugs’ open source Java analysis and commercial Java static analysis tools, specifically on the JSR-305 specification and source code versus byte code analysis topics. Due to these differences, many Java shops will use a commercial Java static analysis tool in conjunction with Findbugs to make sure they’re getting as complete issue detection as possible.

One area that’s been discussed previously is the ability to identify situations of possible null pointer dereference. This peaked my interest and led me to do some benchmarking against a few open source projects to assess the Findbugs analysis on intra- and inter-procedural possible null pointer dereference issues.

The normal assumption is that Findbugs is strong with intra-procedural analysis but unable to provide inter-procedural Java analysis. Here is an inter-procedural example from an open source project called hsqldb:

    protected void directRefreshTree() {

         int[]                  rowCounts;

        ...

            try {

                rowCounts = getRowCounts(tables, schemas);

            } catch (Exception e) {

                 //  Added: (weconsultants)@users

               CommonSwing.errorMessage(e);

            }

            ResultSet col;

            // For each table, build a tree node with interesting info

            for (int i = 0; i < tables.size(); i++) {

                col = null;

                String name;

                try {

                    name   = (String) tables.elementAt(i);

                    if (isOracle && name.startsWith("BIN$")) {

                        continue;

                        // Oracle Recyle Bin tables.

                        // Contains metacharacters which screw up metadata

                        // queries below.

                    }

                    schema = (String) schemas.elementAt(i);

                    String schemaname = "";

                    if (schema != null && showSchemas) {

                        schemaname = schema + '.';

                    }

                    String rowcount = displayRowCounts

                                      ? (" " + DECFMT.format(rowCounts[i]))

                                      : "";

    ...

    }

In the first try block getRowCounts() can have a null value.  How you ask?

    protected int[] getRowCounts(Vector inTable,

                                 Vector inSchema) throws Exception {

        if (!displayRowCounts) {

            return (null);

        }

      ...

    }

Okay, so automatically finding these issues is not going to happen with Findbugs.  I understand that, and that is why there are commercial tools to help with that.

But what really surprised me are the intra-procedural examples that are missed.  Again using examples from hsqldb and a specific class called TransferDb, Findbugs finds one issue. But there are clearly more intra-procedural issues that you would think Findbugs would have found.  Let’s take a look:

    TransferResultSet getData(String statement)

    throws DataAccessPointException {

        ResultSet rsData = null;

        try {

            if (srcStatement != null) {

                srcStatement.close();

            }

            srcStatement = conn.createStatement();

            rsData       = srcStatement.executeQuery(statement);

        } catch (SQLException e) {

            try {

                srcStatement.close();

            } catch (Exception e1) {}

            srcStatement = null;

            rsData       = null;

            throw new DataAccessPointException(e.getMessage());

        }

        return new TransferResultSet(rsData);

    }

This is a conditional situation where “srcStatement” could have a null value (the conditional is false), then you throw an exception with “createStatement()” and dereference later with “srcStatement.close()”.  Why does Findbugs miss this issue?  There are several issues of this type.

Or another situation where you throw an exception with “createStatement()” again.  This example spans multiple lines so you will find it in an attachment.  Here “select_rs” has a value of null and is clearly dereferenced much later on.

       ResultSet         col            = null;

        int               colnum         = 1;

  Statement         stmt           = null;

Source: null here ->  ResultSet         select_rs      = null;

        ResultSetMetaData select_rsmdata = null;

         try {

Exception throw here -> stmt           = conn.createStatement();

            select_rs      = stmt.executeQuery(TTable.Stmts.sSourceSelect);

            select_rsmdata = select_rs.getMetaData();

            col = meta.getColumns(TTable.Stmts.sDatabaseToConvert,

                                  TTable.Stmts.sSchema,

                                  TTable.Stmts.sSourceTable, null);

        } catch (SQLException eSchema) {

            // fredt - second try with null schema

            if (TTable.Stmts.sSchema.equals("")) {

                try {

                    col = meta.getColumns(TTable.Stmts.sDatabaseToConvert,

                                          null, TTable.Stmts.sSourceTable,

                                          null);

                } catch (SQLException eSchema1) {}

            }

        }

        try {

            while (col.next()) {

                String name = Dest.helper.formatIdentifier(col.getString(4));

                int    type        = col.getShort(5);

                String source      = col.getString(6);

                int    column_size = col.getInt(7);

                String DefaultVal  = col.getString(13);

                boolean rsmdata_NoNulls =

                    (select_rsmdata.isNullable(colnum)

                     == java.sql.DatabaseMetaData.columnNoNulls);

                boolean rsmdata_isAutoIncrement = false;

                try {

                    rsmdata_isAutoIncrement =

                        select_rsmdata.isAutoIncrement(colnum);

                } catch (SQLException e) {

                    rsmdata_isAutoIncrement = false;

                }

                int rsmdata_precision = select_rsmdata.getPrecision(colnum);

                int rsmdata_scale     = select_rsmdata.getScale(colnum);

                type = helper.convertFromType(type);

                type = Dest.helper.convertToType(type);

                Integer inttype  = new Integer(type);

                String  datatype = (String) TTable.hTypes.get(inttype);

                if (datatype == null) {

                    datatype = source;

                    tracer.trace("No mapping for type: " + name + " type: "

                                 + type + " source: " + source);

                }

                if (type == Types.NUMERIC) {

                    datatype += "(" + Integer.toString(rsmdata_precision);

                    if (rsmdata_scale > 0) {

                        datatype += "," + Integer.toString(rsmdata_scale);

                    }

                    datatype += ")";

                } else if (type == Types.CHAR) {

                    datatype += "(" + Integer.toString(column_size) + ")";

                } else if (rsmdata_isAutoIncrement) {

                    datatype = "SERIAL";

                }

                if (DefaultVal != null) {

                    if (type == Types.CHAR || type == Types.VARCHAR

                            || type == Types.LONGVARCHAR

                            || type == Types.BINARY || type == Types.DATE

                            || type == Types.TIME

                            || type == Types.TIMESTAMP) {

                        DefaultVal = "\'" + DefaultVal + "\'";

                    }

                    datatype += " DEFAULT " + DefaultVal;

                }

                if (rsmdata_NoNulls) {

                    datatype += " NOT NULL ";

                }

                v.addElement(inttype);

                datatype = helper.fixupColumnDefRead(TTable, select_rsmdata,

                                                     datatype, col, colnum);

                datatype = Dest.helper.fixupColumnDefWrite(TTable,

                        select_rsmdata, datatype, col, colnum);

                create += name + " " + datatype + ",";

                insert += "?,";

                colnum++;

            }

Sink: dereference here-> select_rs.close();

            stmt.close();

            col.close();

        } catch (SQLException e) {

            throw new DataAccessPointException(e.getMessage());

        }

So I’m a little confused as to why these issues are not highlighted.  It seems that this is the exact type of analysis Findbugs does.  Maybe it has to do with not recognizing the possible exceptions?  Does anyone have any ideas?


JSR 305: a silver bullet or not a bullet at all?

Posted by Mikhail Ksenzov   March 30th, 2009

JSR-305 is a Java Specification Request intended to improve the effectiveness of static analysis tools operating in Java 5+ environments. The idea here is that one can use special purpose annotations in order to provide static analysis tools with hints regarding the behaviour and side effects of methods.

An example of such annotations can be found in the presentation ‘Annotations for Software Defect Detection’ by William Pugh, who is masterminding the whole spec. Here we go:

 1: void test() {
 2:    if (spec != null) fFragments.add(spec);
 3:    if (isComplete(spec)) fPreferences.add(spec);
 4: }
 6:
 5: boolean isComplete(AnnotationPreferences spec) {
 6:    return spec.getColorPreferenceKey() != null
 7:        && spec.getColorPreferenceValue() != null
 8:        && spec.getTextPreferenceKey() != null
 9:        && spec.getOverviewRulerPreferenceKey() != null;
10: }

What’s wrong with the snippet above? Well, the check for null on line 2 shows that the developer expects that the value of ‘spec’ can potentially be null, but it is still passed to method ‘isComplete’. Later, ‘isComplete’ attempts to dereference the value, which causes a NullPointerException.

According to Dr. Pugh, the best way to detect this issue statically is to force a developer to add the annotation @Nonnull to the method signature like this:

5: boolean isComplete(@Nonnull AnnotationPreferences spec) {

In this way, a basic static analysis tool that can minimally track ‘spec’ as a potential suspect for ‘null’ can issue a warning when the @Nonnull annotation is contradicted (that is, when ‘spec’ is passed to ‘isComplete’ as a parameter).

There are two problems with this approach:

  • it forces the developer to do work that rightly should be performed by a static analysis engine
  • it takes time to write the annotation for static analysis, but it takes even more effort to maintain the annotations and the actual code base in a consistent state.

In reality, the proposals behind JSR-305 exist to enable a tool intended for single function analysis (so-called intra-procedural analysis) to act as if it were performing whole program analysis by requiring the developer to state expected behaviour up front (whether or not that behaviour is actually expressed correctly in the developer’s code).

In contrast, this same scenario is supported by a whole program static analysis tool (so-called inter-procedural analysis) without developer intervention:

  1. First, a complete call-graph of the system is built, and then all the methods are ordered so that called methods are processed prior to callers — such an ordering allows the tool to generate all the necessary information about, in this instance, the method ‘isComplete’ by the time the analysis of the method ‘test’ begins.
  2. During the analysis of ‘isComplete’, the tool records the fact that the incoming argument ‘spec’ is dereferenced.
  3. Next, the method ‘test’ is analyzed. In this method, the variable ‘spec’ is checked for null, so it is tracked as a potential suspect for an exception. Using the information generated about ‘isComplete’ the tool can reliably issue a warning on line 3, since it already knows that ‘isComplete’ dereferences the incoming argument.

So that example applies to a simple unconditional dereference scenario. In more complicated cases, Dr. Pugh proposes to use the annotation parameter ‘when’, with one of the following values:

  • ALWAYS
  • NEVER
  • MAYBE
  • UNKNOWN

For example: ‘@Nonnull(when=When.NEVER)’ means that a value is always null in the given context.

This specification seems to be a compromise between the amount of information provided by a developer to a static analysis tool and the amount of effort a developer has to put into it, a compromise that does not seem to be a particularly good solution here. First of all, the amount of information provided in such a manner is insufficient to provide accurate analysis, and secondly this seems to be too much work for the developer, especially when this work can be avoided.

Let’s examine how conditional value dereferencing is supported by whole program static analysis tools:

 1: void test() {
 2:     entity.qualifiedName = null;
 3:     saveName(entity, false);
 4: }
 5:
 6: boolean    saveName(Entity entity, boolean qualified) {
 7:    String name;
 8:    if (qualified)
 9:        name = entity.qualifiedName.trim();
10:    else
11:        name = entity.name.trim()
12:
13:     save(name);
14: }

In this example, an inter-procedural static analysis tool would first analyze the method ‘saveName’. A good analysis engine should be able to record the fact that this method only dereferences ‘entity.qualifiedName’ if the second parameter, ‘qualified’, is set to ‘true’. This, it would appear, is a deal more detailed than one can practically achieve by adding @Nonnull(when=When.XXX) annotations, even with all the work the annotation implies for the developer.

Next, the method ‘test’ would be analyzed. A good static analysis tool will naturally keep track of ‘entity.qualifiedName’ because of the assignment to ‘null’ on line 2 and its therefore potential use in an exception causing context. However, given that the actual call to ‘saveName’ on line 3 uses ‘false’ as its second argument, such a tool will not issue a warning that would in reality be a false positive, since the knowledge gained from analyzing ‘saveName’ disqualifies any potential warning due to the conditional relationship between arguments.

In summary, JSR-305 proposes a whole roster of interesting ideas for using annotations to enhance static analysis of Java code, and NPE detection seems to be only one aspect of this specification request. In upcoming blog posts, we shall continue the discussion of proposed annotations as well as offering our own ideas about how and when annotations should be used in static analysis.