## Thursday, June 20, 2013

### McCabe Cyclomatic Code Complexity Measure

I recently installed the Metrics plugin in MyEclipse at work.  It provides a great big stack of code complexity statistics -- of which the most easy to understand for non-computer geeks is the McCabe Cyclomatic Code Complexity metric.  This is effectively a measure of how many different paths there are through a particular piece of code.

Imagine if you lived in a big city, and had to find your way to another spot in the same city.  Every place where you could make a decision of where to turn or go ahead represents complexity, in the same way that the decision points in a programming language represent complexity.  Pretty obviously, the more decision points there are, the more opportunities there are to make mistakes.  A cyclomatic code complexity greater than ten is supposed to be a sign that you need to refactor the code.  So what happens when you see lots of code with complexity measures above 30?  Oh dear.

And when I actually look at some of the functions with high code complexity measures, what do I find?  The equivalent of dropping a mouse at Santa Monica Blvd. and Ocean Avenue in Santa Monica, and telling it to find its way to Boyle Heights in East Los Angeles.

There is a lifetime (perhaps several lifetimes) of work to clean up this pile.

1. Yep. Been there myself. There's a decent book on the subject: "Working Effectively with Legacy Code" by Michael C. Feathers.

2. The first page isn't a hangman's noose?

3. It's easy to take a large function and to break it into smaller pieces, but making the result easier to understand doesn't necessarily follow.

A function containing a huge mess of tangled code can be hard to follow, but having a huge mess of small functions can be even more difficult to follow.

Finding the right abstractions that will cleave a complex problem along lines that make the result easy to understand and to maintain is not easy.

Which is why it happens so rarely.

4. I have yet to see a huge mess factored into multiple functions that is harder to understand than the original. At a minimum, factoring means that you have some idea what different blocks were supposed to do, and often lets you get a high level definition of what the function is supposed to do.

I suppose if the refactoring was done badly, it could make it worse.

5. Why is it there is never enough time or money to do it right the first time, but there is always time and money to do it two or three more times when the stuff hits the fan and the product doesn't work? I've worked in embedded automotive software for 25 years, and the software is as bad as I've ever seen it. Not helped by loads of offshore tadpoles laying turds in the code, and cycling in a new batch of tadpoles every 3 months. If I see another goto in C code I'm going postal...