8 posts

Archive for June, 2009


Static analysis for Ruby/Python

Posted by Denis Sidorov   June 29th, 2009

As a developer of static analysis tool for mainstream statically-typed languages, like C++ and Java, I was wondering for quite a while about how well static analysis applies to dynamically-typed languages, like Ruby and Python. And recently, I have come across this interesting project on GitHub: Reek – Code smell detector for Ruby. Well, I suppose that’s just a fancy way to name a static analysis tool.

What can Reek detect? It does not do heavyweight data/control flow analysis, so the list is not very exciting:

Interestingly, despite the poor set of features, compared to modern C++/Java static analysis solutions, Reek gets positive feedback from Ruby community. Some take it to the extreme – integrate Reek into the build and handle code smells as failing tests, and that of course breaks the build every time a new code smell is detected. So, is Ruby community starving for real static-analysis tool?..

This question forced me to run a quick research on what’s the current state of static analysis tools for dynamic languages. I was able to find the following tools for Ruby and Python:

Python tools appear to be pretty conservative about program analysis and try to “compensate” the dynamic nature of the language.

  • PyChecker and pylint – These two tools aim to provide the same kind of warnings/errors, a compiler for statically-typed language would report automatically. For example: too few/many arguments in a method call, unused variables, using return value of a method that does not actually return any value, etc. One of the checkers for pylint, in fact tries to “simulate” static type system by using type inference, and detect missing members and functions, and this, in turn, might involve some elements of data flow analysis. Beside that, there is a bunch of metrics-based rules that can be checked automatically – you can put a limit on number of methods in a class, number of variables in a function, size/complexity of a function etc.

Tools for Ruby introduce the same kind of checks, but from slightly different angle. Rather than adopting the lint metaphor (a complementary tool, finding bugs/errors that compiler does not detect), the tools are positioned to find design flaws.

  • Reek – See above.
  • Flay – Checks code bases for duplicates. Aims to enforce the DRY design principle.
  • Flog – A tool to spot the most complex functions in your code. Metric-based.
  • Roodi – Supports a bunch of simple syntax-based and metric-bases checks. For example: assignment in condition, case missing else, max method/class/module line count, method/class/module name check.

Having looked through these projects, I’ve come to find two things:

First – There are some tools that do static analysis for Ruby/Python, but they are quite simple and don’t do (or don’t try to do?) any advanced heavyweight analysis.

Second – What Ruby/Python developers want to be automatically found in their code, is quite different from what their C/C++ fellows expect from a static analysis tool. And that is because the languages and programming cultures are quite different. For example, in Ruby there is no such thing as:

  • null pointer dereference – nil is a first class object
  • array bounds violation – array would return nil when index is out of bounds
  • uninitialized variables – all variables are automatically initialized with nil
  • memory leaks – garbage collection takes care of that

But they do have errors in their programs, don’t they? Yes, but Ruby/Python developers rely on tests pretty heavily. In fact, some claim, that tests are the only right way to deal with bugs in your program. This way a tool for automatic error detection might even be considered harmful – just because it can be used as an excuse for not writing tests.

So, what kind of job is left for a static analysis tool? Well, detect design flaws, a.k.a. “code smells”. In other words – automatically find subjects for refactoring (a change that does not affect program functionality). This way static analysis tool fits naturally into Red/Green/Refactor cycle.

Another possible area where static analysis based error detection can prove useful is security vulnerabilities – it is pretty hard to spot all corner cases up-front (or through exploratory testing), just because security is a complex domain and requires good amount of knowledge and expertise.


Top Reasons To Not Go Scrum/Agile

Posted by Todd Landry   June 25th, 2009

There was a recent blog on the top 10 good reasons for Scrum, so in the spirit of equality, I thought I would do one on the top 10 reasons not to go Scrum. Now, before I get started, let it be known that I am a huge fan of Scrum and agile (so much in fact that I am certified as a Product Owner), but there are definitely situations where it just might not make sense to go that route.

1. Your development team is geographically dispersed. In my opinion, this is the main reason why it would not make a lot of sense to go Scrum. Scrum (and agile) are all about communication, and even though technology has made it easier to communicate across the globe, a winky-face over MSN cannot get the same message across as a face to face conversation.

2. If you are currently meeting deadlines and release dates. I’m a big believer in the adage, if it ain’t broke, don’t fix it. Why would you want to mess up something that is working for you? Short answer…you don’t…

3. You cannot get complete buy-in or 100% commitment from management, development, PM, etc. If a PM cannot actively attend meetings, or management wants to make rash decisions outside of the team, then it just will not work…don’t even bother trying.

4. If people need complete clarity about the solution before even starting the project. The very nature of agile/scrum lends itself to this just not happening.

5. You have a fixed deadline, with a fixed set of requirements. This happens all the time…perhaps you have specific functionality planned for a big event, or a big customer. If this is the case, then it might make sense to manage this using more traditional project management methods.

While the original blog that inspired this had 10 reasons, my time this iteration is up, so I will only provide 5. If you have others you’d like to share, feel free to leave a comment.


Parallel Lint

Posted by Alen Zukich   June 22nd, 2009

Interesting article on static analysis tools to help find concurrency issues.  These so called “Parallel Lint” tools are specific to finding these types of issues.  Overall there are some great discussions on certain tools, and it is always nice when Klocwork gets mentioned.  But my problem is with the categorization of these tools.  It always makes me feel sick every time someone puts Klocwork in the same category of “powerful static analysis” with JLint, C++Test, FXCop and my favorite PC-Lint.

This article goes deeper into PC-Lint and what they are doing with deadlocks.  The author highlights a very important point here:

“Like compilers, static analyzers operate each .cpp file separately. And that’s why if f() function is called in parallel mode in file A from file B, we cannot know this when analyzing file B. Of course there are static analyzers which analyze the whole set of files at once but it is a very difficult task. At least, PC-Lint operates each file separately.”

This is a point I feel keeps getting lost with modern static analysis tools today.  Forget the Lint of the past or these other tools, their focus is on file by file analysis.  These old tools are doing simple grep type analysis.  Sometimes where you’re lucky you get a little bit of control flow with a dash of data flow analysis.  But plainly they are missing the deep inter-procedural analysis and techniques that are used with modern static analysis tools today.  I’m hoping the message is getting out there that static source code analysis is far far beyond Lint and is providing the context you never had before.


Get the red out…

Posted by Todd Landry   June 17th, 2009

When I first started at Klocwork, I didn’t really know a lot about source code analysis. I understood the basic concept of how it finds bugs in software, but that is was essentially it. Sure I knew about Memory leaks, but I truly believed that they were only found a day or two before the GA date…at least, that was when our testing team always found them.

In one of my teams prior to joining Klocwork, we used Scrum. We were hard core, with daily 15 minute scrums, retrospective meetings, sprint planning sessions, defining “done”, secret handshakes, the whole 9 yards. We also broke our features down into small tasks, and those tasks were written on cue cards and then stuck to a big wall for all to see. What a great way to see the progress of a sprint. We had green cards for development tasks, blue cards for testing, yellow cards for documentation, and red cards for bugs. I remember how after 2 or 3 days into a sprint, the red cards would start showing up, and developers would then start addressing them. Since one of our team ‘rules’ was each person could only have a single task checked out at one time, our developers had to check-in the green card they were working on in order to tackle a red card. By the end of a sprint there were always a number of red cards left, which by definition, needed to be addressed first in the next sprint. I’m sure you can imagine the enthusiasm of heading into the next sprint knowing there was a wall of red cards to address first.

Anyways, my first few weeks at Klocwork consisted of talking with a lot of people; customers, prospects, etc. These people knew source code analysis, but they only knew the traditional way of source code analysis (SCA), and not the new generation of SCA where developers check their code before they check-in their code. I remember thinking I must be missing something…why is this so hard for these people to understand?  Source code analysis turns a lot of those red cards into green cards.  For more info on how SCA and agile can work together, check out this webinar I recently did…


Agile compatible with safety-critical development?

Posted by Brendan Harrison   June 15th, 2009

Interesting paper and presentation (pdf) from Emmanuel Chenu at Thales Avionics that describes how they’re using several Agile concepts as part of their safety-critical avionics software projects. With the exception of pair programming, my read is that much of this is mapping activities that have been done in a safety-critical environment (e.g. test driven development) to several Agile principles, rather than the introduction of concepts that are foreign to safety-critical development. The other one that probably hasn’t been done in most safety-critical shops is continuous integration, but I’d argue that CI (or at least a “build early and often” philosophy), has transcended Agile and is just becoming “the way things are done”, regardless of whether you’re a “Big A Agile”, agile, or iterative development shop.

Either way, it’s interesting how even the most heavy, formal, process-driven development teams are looking at aspects of Agile they can embrace to make their development more flexible, responsive, while still producing highly reliable software. Of course, as he notes, there’s obviously a limit to how “Agile” an avionics development team can really become given the level of formal documentation required through all aspects of a DO-178B project. I’m pretty sure if you ever submitted this kind of documentation to a certification authority they’d probably not accept it:

Agile Documentation

“Oh, if only it were open source…”

Posted by Gwyn Fisher   June 8th, 2009

Don’t get me wrong, I’m a big fan of open source, but why does everything have to be black and white? If it’s closed it must be evil and by association probably not written well, whereas if it’s open, it’s awesome and godly in its unnatural power to cure world hunger?

I’m referring, in this particular instance, to the righteous indignation that surfaced as a result of the castigation served up for the manufacturers of that ever-popular device, the breathalyzer. And yes, I’ve been stood at the side of the road looking stupidly at the officer whilst trying to remember just why I thought that 15th shot of Jaeger was such a good idea, but I digress…

The manufacturer of this particular device, the Draeger Alcotest 7110 MKIII-C, had claimed vociferously that their device worked correctly, that their code was a part of their device, therefore proprietary, and not available to opposing council for analysis. Unfortunately for them, the courts disagreed and ordered the code handed over for analysis by Base One Technologies (who appear to be nothing more or less than your typical minority owned GSA hand-out specialists – your taxes at work, people…).

And what did they find? That far from being the highly skilled work of a bunch of Ph.D.’s that might warrant being labeled proprietary and top secret, it was instead a bunch of off-the-shelf engineering that had obviously been through many different iterations of development, through several different iterations of design, and wasn’t, bottom line, particularly smart. Nor was it particularly accurate, of course, which was the real hummer.

But come on, how many of us work on code (proprietary or open) that we can claim hand-on-heart hasn’t strayed from initial design goals?

Lest I now be pilloried for standing up for sub-par, closed (evil! evil!) source code, let me quickly segue onto the meme that is most aggravating me in relation to this story. And let me also quickly say for the record that anybody producing a breathalyzer that isn’t accurate needs stoning and feeding to the wolves, that much goes without saying. Back to the topic at hand…

So, Base One found a whole host of noxious practices and poorly executed designs in this particular code base. Not least, of course, being the afore-mentioned inaccuracies. But it did so in a very dry, engineering-centric sort of way that obviously wasn’t intended to pander to the pitchfork waving bigots, and so the ever helpful popular press took it upon themselves to take the one big number (ooo, shiny!) from the report, take it out of context (but of course), and then to label all closed source as bad by association.

  • 19,400 potential errors in the code!!!

That’s obviously easier to get your editor interested in than a bunch of boring technical detail, like what was actually wrong with the device. 19. Thousand. Errors. Come on people, that’s a big number, amirite? Three out of every five lines of code contains a potential error. Sky. Falling. Must. Grab. Pitchfork!

But let’s read the small print here (or actually, not small at all, in fact it was right in the original report, but again wasn’t exciting enough to repeat): that number comes from an analysis performed using lint. You know, the tool that emits 400 errors for every 200 characters of input? You know you miss the days of 2,400 baud terminals that actually couldn’t keep up with the rate of error emission from this thing and just turned the whole screen into a weird whoosh of green CRT rays, don’t you?

Oh, but if only it was open source, goes the meme, the world (or at least that part of it which finds itself staring stupidly at the officer by the side of the road) would be a much better place. At least, that’s what we’re encouraged to believe.

Anybody tried lint on an open source project of any renown recently? I have (I won’t name them, not because it’d be embarrassing, but because it’s kind of irrelevant). Frankly it’s almost impossible to find a project that doesn’t emit thousands of lint warnings. Let’s face it, if you can write code that doesn’t emit lint warnings, you’re spending time in that happy place I like to call Hello World.

Come on people, wake up. There are very good reasons to hate bad software, whether it’s closed or open. Don’t be a schmuck and jump on the religious bandwagon just because it’s there. Think for yourself. There’s very good reasons why this device was castigated as a piece of junk, and they had nothing whatsoever to do with that big shiny number. If you’re going to report on something technical, do your readers the favor of at least trying to understand what you’re talking about before you go balls out into meme land.


Agile…for non-software development

Posted by Todd Landry   June 4th, 2009

Ever had to work on a “special” project and really didn’t know where to start? A team I worked with was faced with this not too long ago…we had to put together a complete business plan for our products. This complete business plan included understanding everything about your business, and I mean everything…average deal size, average discount per deal, regional breakdown of deals, deals with multiple products included, SWOT, positioning, and so on. There was a ton of information we needed to pull together in a relatively short time, and we really didn’t know the best approach to take to address this. Having been working on a scrum team for about 8 months, I suggested we try to tackle this using some of the principles of scrum. So we proceeded to break down the effort into a series of tasks, and then prioritized these tasks, thus creating our backlog. Tasks would be assigned, and daily meetings were then used to report our progress on them. If a new task popped up, which usually happened, we added that to our prioritized backlog, and continued on. Everyone on the team knew exactly what had been completed, what was being worked on, and what was still outstanding.

There are probably a number of other approaches you could take to deal with this type of project, but I thought applying some Agile principles worked out very well. Anyone have any other applications of Agile, not related to software development they can share?


Developer productivity thrown out the door

Posted by Alen Zukich   June 2nd, 2009

I deal with many organizations that deploy the Klocwork software to the desktop so that developers can use our tools to help them find and fix bugs in their code.  The message is simple, fix your bugs before you check in your code.  Many of the organizations I deal with have a mismatch of environments and tools.  In the world of writing code it is not uncommon to find developers using Emacs, Vim, Visual Studio, Eclipse or any number of IDEs/text editors.  Nothing wrong with this, although it doesn’t offer a clean, repeatable environment but it does work.

Recently I keep running into situations where productivity seems to be thrown out the door.  Not only were the developers a mix of many (and I mean many) development environments but they made the decision to code on a platform that they do not compile on. They would write code in Windows or Linux then store their code in a central repository or some sort (in one case it was just NFS), then ssh to a different Linux machine and run the compiler on the code.  If the code fails to compile, look at that syntax error and go back to your other machine to navigate to the line of code and figure out the error.  Rinse and repeat.  Wow…