As a developer of static analysis tool for mainstream statically-typed languages, like C++ and Java, I was wondering for quite a while about how well static analysis applies to dynamically-typed languages, like Ruby and Python. And recently, I have come across this interesting project on GitHub: Reek – Code smell detector for Ruby. Well, I suppose that’s just a fancy way to name a static analysis tool.
What can Reek detect? It does not do heavyweight data/control flow analysis, so the list is not very exciting:
- Code Duplication – AFAIU, it’s not very accurate, ’cause Reek only tracks duplicated method calls (sic!)
- Some design anti-pattens, like Feature Envy, Utility Function, Nested Iterators and Control Couple
- Some metrics-based stuff, like Large Class, Long Method and Long Parameter List
- (Arguably) bad coding practices, like Uncommunicative Name
Interestingly, despite the poor set of features, compared to modern C++/Java static analysis solutions, Reek gets positive feedback from Ruby community. Some take it to the extreme – integrate Reek into the build and handle code smells as failing tests, and that of course breaks the build every time a new code smell is detected. So, is Ruby community starving for real static-analysis tool?..
This question forced me to run a quick research on what’s the current state of static analysis tools for dynamic languages. I was able to find the following tools for Ruby and Python:
Python tools appear to be pretty conservative about program analysis and try to “compensate” the dynamic nature of the language.
- PyChecker and pylint – These two tools aim to provide the same kind of warnings/errors, a compiler for statically-typed language would report automatically. For example: too few/many arguments in a method call, unused variables, using return value of a method that does not actually return any value, etc. One of the checkers for pylint, in fact tries to “simulate” static type system by using type inference, and detect missing members and functions, and this, in turn, might involve some elements of data flow analysis. Beside that, there is a bunch of metrics-based rules that can be checked automatically – you can put a limit on number of methods in a class, number of variables in a function, size/complexity of a function etc.
Tools for Ruby introduce the same kind of checks, but from slightly different angle. Rather than adopting the lint metaphor (a complementary tool, finding bugs/errors that compiler does not detect), the tools are positioned to find design flaws.
- Reek – See above.
- Flay – Checks code bases for duplicates. Aims to enforce the DRY design principle.
- Flog – A tool to spot the most complex functions in your code. Metric-based.
- Roodi – Supports a bunch of simple syntax-based and metric-bases checks. For example: assignment in condition, case missing else, max method/class/module line count, method/class/module name check.
Having looked through these projects, I’ve come to find two things:
First – There are some tools that do static analysis for Ruby/Python, but they are quite simple and don’t do (or don’t try to do?) any advanced heavyweight analysis.
Second – What Ruby/Python developers want to be automatically found in their code, is quite different from what their C/C++ fellows expect from a static analysis tool. And that is because the languages and programming cultures are quite different. For example, in Ruby there is no such thing as:
- null pointer dereference – nil is a first class object
- array bounds violation – array would return nil when index is out of bounds
- uninitialized variables – all variables are automatically initialized with nil
- memory leaks – garbage collection takes care of that
But they do have errors in their programs, don’t they? Yes, but Ruby/Python developers rely on tests pretty heavily. In fact, some claim, that tests are the only right way to deal with bugs in your program. This way a tool for automatic error detection might even be considered harmful – just because it can be used as an excuse for not writing tests.
So, what kind of job is left for a static analysis tool? Well, detect design flaws, a.k.a. “code smells”. In other words – automatically find subjects for refactoring (a change that does not affect program functionality). This way static analysis tool fits naturally into Red/Green/Refactor cycle.
Another possible area where static analysis based error detection can prove useful is security vulnerabilities – it is pretty hard to spot all corner cases up-front (or through exploratory testing), just because security is a complex domain and requires good amount of knowledge and expertise.