2 posts
Home > Denis Sidorov

 Denis Sidorov

Klocwork Development Lead


Static analysis for Ruby/Python

Posted by Denis Sidorov   June 29th, 2009

As a developer of static analysis tool for mainstream statically-typed languages, like C++ and Java, I was wondering for quite a while about how well static analysis applies to dynamically-typed languages, like Ruby and Python. And recently, I have come across this interesting project on GitHub: Reek – Code smell detector for Ruby. Well, I suppose that’s just a fancy way to name a static analysis tool.

What can Reek detect? It does not do heavyweight data/control flow analysis, so the list is not very exciting:

Interestingly, despite the poor set of features, compared to modern C++/Java static analysis solutions, Reek gets positive feedback from Ruby community. Some take it to the extreme – integrate Reek into the build and handle code smells as failing tests, and that of course breaks the build every time a new code smell is detected. So, is Ruby community starving for real static-analysis tool?..

This question forced me to run a quick research on what’s the current state of static analysis tools for dynamic languages. I was able to find the following tools for Ruby and Python:

Python tools appear to be pretty conservative about program analysis and try to “compensate” the dynamic nature of the language.

  • PyChecker and pylint – These two tools aim to provide the same kind of warnings/errors, a compiler for statically-typed language would report automatically. For example: too few/many arguments in a method call, unused variables, using return value of a method that does not actually return any value, etc. One of the checkers for pylint, in fact tries to “simulate” static type system by using type inference, and detect missing members and functions, and this, in turn, might involve some elements of data flow analysis. Beside that, there is a bunch of metrics-based rules that can be checked automatically – you can put a limit on number of methods in a class, number of variables in a function, size/complexity of a function etc.

Tools for Ruby introduce the same kind of checks, but from slightly different angle. Rather than adopting the lint metaphor (a complementary tool, finding bugs/errors that compiler does not detect), the tools are positioned to find design flaws.

  • Reek – See above.
  • Flay – Checks code bases for duplicates. Aims to enforce the DRY design principle.
  • Flog – A tool to spot the most complex functions in your code. Metric-based.
  • Roodi – Supports a bunch of simple syntax-based and metric-bases checks. For example: assignment in condition, case missing else, max method/class/module line count, method/class/module name check.

Having looked through these projects, I’ve come to find two things:

First – There are some tools that do static analysis for Ruby/Python, but they are quite simple and don’t do (or don’t try to do?) any advanced heavyweight analysis.

Second – What Ruby/Python developers want to be automatically found in their code, is quite different from what their C/C++ fellows expect from a static analysis tool. And that is because the languages and programming cultures are quite different. For example, in Ruby there is no such thing as:

  • null pointer dereference – nil is a first class object
  • array bounds violation – array would return nil when index is out of bounds
  • uninitialized variables – all variables are automatically initialized with nil
  • memory leaks – garbage collection takes care of that

But they do have errors in their programs, don’t they? Yes, but Ruby/Python developers rely on tests pretty heavily. In fact, some claim, that tests are the only right way to deal with bugs in your program. This way a tool for automatic error detection might even be considered harmful – just because it can be used as an excuse for not writing tests.

So, what kind of job is left for a static analysis tool? Well, detect design flaws, a.k.a. “code smells”. In other words – automatically find subjects for refactoring (a change that does not affect program functionality). This way static analysis tool fits naturally into Red/Green/Refactor cycle.

Another possible area where static analysis based error detection can prove useful is security vulnerabilities – it is pretty hard to spot all corner cases up-front (or through exploratory testing), just because security is a complex domain and requires good amount of knowledge and expertise.

Lambda expressions in C++

Posted by Denis Sidorov   February 11th, 2009

Have just stumbled across the lamda module in boost (popular C++ general-purpose library known for extensive usage of templates and influence on C++ standard committee).

A quote:

The primary motivation for the BLL (Boost Lambda Library) is to provide flexible and convenient means to define unnamed function objects for STL algorithms …

for_each(a.begin(), a.end(), std::cout << _1 << ' ');

My first thought was: "Hmm ... a macro?" It appears it is not. The <code>_1 object is a lambda placeholder, and should be read as first parameter of lambda expression (a.k.a. unnamed function). In fact the std::cout &lt;&lt; _1 &lt;&lt; ' ' expression is automatically converted into a function-like object, that can be used with most STL algorithms (like for_each, find_if, etc.) So, instead of writing this (a traditional STL way): template <typename T> struct my_printer : public std::unary_function<void, T> {      void operator()(const T &x)      { std::cout << x << ‘ ‘; } }; // … for_each(a.begin(), a.end(), my_printer<my_element_type>());

You can use the above expression, and C++ compiler automatically creates anonymous class that represents an unnamed function. The trick is that all this "behind-the-scene" work is expressed in terms of C++ templates and heavily relies on automatic type inference, and operator overloading. Essentially the module has to overload every possible C++ operator that can be applied to the lambda placeholder, "delay" the evaluation of expression and "wrap" the computation into a callable object.

This approach apparently simulates Lisp lambda expressions (hence the name of the module) and Smalltalk/Ruby code blocks:

Smalltalk

a do: [ :⁣x | Transcript show: x; show: ' ' ]

Ruby

a.each { |x| print x, " " }

Lisp (Scheme)

(for-each (lambda (x) (display x) (display " ")) a)

Lambda expressions and code blocks are essential part of these languages and one of the things that make them consistent and fun to use. Now, this library is trying to adopt this concept into C++.

Pretty cool, but the following limitations lead me to think that this C++ implementation of lambda expressions is far from complete:

  • Boost lambda expressions are not true closures (you can not refer to "non-local" data from the block)
  • Some operators (most notably "." and assignment) can not be overloaded this way, so i = _1 and _1.my_field does not work and requires some extra C++ tricks
  • Can not use statements in this kind of code blocks, works only with expressions
  • If you make a simple mistake in this kind of lambda-expression (or change some related interface), the C++ compiler will gladly present you with a pile of human-unreadable error messages with about a dozen template instantiations with names that do not fit a line of text (no matter how wide your LCD panel is), using each other ... that's a common problem of all non-trivial template libraries

After all, this demonstrates that C++ templates and type inference are indeed powerful concepts, but still not powerful enough to redefine/extend the language in consistent way.

Another problem with this kind of smart C++ tricks is that it creates an illusion of simplicity. As always with C++ - you can not rely on this simplicity, unless you know the details of what's under the hood. If you dare to ignore implementation details you risk losing control over your own code one day.