2 posts

Archive for April, 2009


Build Analysis and Source code analysis must work together

Posted by Alen Zukich   April 25th, 2009

There’s been some recent discussion around using source code analysis (SCA) technology for build clean-up and optimization. I thought it might be useful to try and separate the spin from reality and outline where and how static source code analysis can be used for build optimization.

First, every SCA tool worth its salt does build analysis. Automated discovery of a customer’s build system is a required capability for deep static code analysis. Most users of SCA attempt to discover bugs, security vulnerabilities, and other maintainability problems. Some customers will also leverage the build analysis itself to conduct targeted clean-up of the build. Three common ways that this can be done are:

  1. Trace file analysis – provides visibility of the entire build process to help find issues and inefficiencies in the build that can impact build times, and ultimately developer productivity.
  2. Header file analysis – goal here is to identify inefficient and overly complex include structures that can lead to long build times and bloated system size.
  3. Interface analysis – find low level issues that can cause build failures due to improper API usage.

First, trace file analysis is the process of analyzing, understanding and mapping every process executed during your build process, including all compiler and linker invocations. This kind of analysis is mandatory for a good SCA tool since it is necessary to understand all the full details of how you compile and link code so that detailed models of your system can be generated. The benefit from a build optimization standpoint is that this maps the entire build process and not just your compile and linking process, gives development leads and build managers visibility into the build, any inefficiencies, and where it may just be broken.

The other two types of analysis, header file and interface analysis, is focused on the source code directly but also important to build improvement. Specifically the focus here is on header files themselves and optimizations you can perform. A simple example of this is an include file that is simply never used. Why include it? This adds to the build time, size of the system and not to mention the complexity and maintainability of the system. Finding extra includes is not ground breaking technology but there are various types of issues to look for with header files. Other examples of more complex issues that involve deep analysis are with extra transitive issues or context dependent issues. For example, a missing include with a transitive dependency is a relationship between three or more files. In the following example, the first file, File1.c, includes the second file, header1.h, which, in turn, includes the third file, header2.h. File File1.c uses some symbols from file header2.h, but does not include it directly.

Good practice would have you include header2.h directly, of course developers include header1.h as a means to simply get the build to work. By eliminating instances where, in this example File1.c doesn’t even use anything in header1.h, real reductions in build time, and potentially system size, can be realized. Interface analysis is another type of issue focused on header files again looking for cyclical header files, duplicate header files and a whole slew of other in code issues that can be a nightmare such as multiple definitions or declarations.

Frankly there are limits to what can be done in this area with SCA. There are vendors in the build space such as Electric Cloud or IBM Buildforge who do this for a living and specialize in not only setting up production build environments that scale, but also have tools that complement the kind of optimization described above.


Languages and the theocracy of programming

Posted by Gwyn Fisher   April 7th, 2009

Just returned from ESC San Jose, where I spent a very enjoyable few days surrounded by the “real men” of the programming world. Forget your managed language environments, forget abstractions or object oriented fantasies of design, forget processes like Agile, these guys spend their days down at board level working in assembler and occasionally sticking their heads up into the rarified world of C (but only, you know, for stuff that doesn’t really matter…).

Hell, most of the time the hardware they’re programming is custom built just for that project, sans O/S because, you know, why would you want that crap to get in your way. One guy that stands out in the procession of awesomeness was describing his ASIC to me, asking if we could help with stack overflow problems, and launched into a necessarily abstracted description of the fact that their (classified) device didn’t really have a stack, actually, but it was kind of a well known address range that they normally treated like a stack, although not always because, you know, stuff sometimes gets in the way, so if he, like, stuck something in there that was too big, could we tell him?

God, I love these guys…

Now I began programming on boards in Z80 assembler, so trust me that I do actually know what the heck they’re doing and why, but over time I’ve followed what has felt like a fairly natural migration away from the kind of “if I want to light up that LED I have to store 0×2d in address ‘x’” programming to C, then C++, and more recently to Java and C#. Of them all, I think C++ is probably my favorite, simply because it’s low-level enough to be useful, and high-level enough to let me express myself without having to think too much about it. Frankly, in my opinion while good developers are good in any language, really good developers find their own way and then excel at it. So yes, I can probably program in any language given a few days of spin-up time, but frankly I’m too old and too cranky to get all fired up over the latest innovation of hiding-the-useful-stuff-from-me just so I can do it a bit faster.

Note that I’m not espousing any kind of intelligentsia-sponsored BS argument over the “value” of OO languages over procedural or vice versa (for more vitriol than typically fits on one page, check out this beauty from Torvalds…). And no, I don’t use STL so just shoot me, but it sucks so get over it. And if you get your rocks off over Python or Ruby or Haskell or whatever new lambda calculus-based micro language you’ve just stumbled over, have fun and get your job done, there’s enough of them for everybody, after all.

But don’t try to convert me. Proselytizing is always ugly, so just step away from the bong and let’s all be friends here. The Urban Dictionary nicely summarizes things:

The men who program in C++ are Real Men. The women who program in C++ are Real Men too.

Substitute the name of your favorite language in there in place of C++ and you’ve got the way most developers think about their own language of choice. How about you? Any favorites out there you’re willing to get boiled in the pot for?