Agile Pointing: Fibonacci or Fibo-yes-chi?

My natural inclination is to be skeptical of anything billed as a "software development methodology". I still remember my early years as a professional developer in France in the early 90's when the "Merise" methodology was ubiquitous. To me it seemed like an excuse to spend tons of time on process documentation and other bureaucracy and very little actually writing code (although I avoided it like the plague so I have no idea if this characterization has any merit). By extension, I'm tempted to see the latest craze over "agile" as just a way for slick consultants to make piles of money from gullible corporate customers.

Nonetheless, we've been applying various agile techniques over the past year, and I have to admit that many of them have undeniable value. Organizing tasks in a linear list, ordering them by priority and tackling them one by one is clearly superior to spending weeks on design docs and Gantt charts (on one extreme) or just working willy-nilly on whatever you feel like (on the other). Even corny stuff like user-facing "stories" with descriptions like "As a snarky blogger I want people to agree with my posts without reading them so that I can get an undeserved sense of affirmation" turn out to be useful. This formulation, though clumsy, inherently forces to think about the feature and express it in a way that others can understand, instead of just pounding out a terse technical description that even the developer often can't understand when they go to work on the task later.

One of my favorite agile practices is story pointing. After years of programming in the real-world, it's tempting to conclude that all time estimates are useless. Nonetheless, clients tend to demand some level of visibility into schedules. The agile solution is pointing, where you assign points to stories representing relative complexity rather than absolute timescales. In theory, developers are much better at estimating the former (a notion backed up by both my intuition and experience).

We've been using a three-point scale, with the points corresponding to "easy", "medium" and "hard". This has the virtue of simplicity, but after months of using Pivotal Tracker, I've noticed that very few of our projects have converged on a consistent and useful velocity (the average number of points accepted per sprint, used to predict rough future output). There are various reasons for this: not enough time spent on story writing and pointing, clients not accepting stories on time (so points aren't credited when they should be), using too many (pointless) chores and bugs instead of (pointy) features, etc. But one reason, we decided, is that our point scale is insufficient.

A popular alternative is the so-called Fibonacci scale, typically practiced as 1, 2, 3, 5, 8, 13, 20, 40, 100. I found this scale gimmicky, but some of my colleagues have been arguing passionately for it, even going as far as to link to a mathematical proof that Fibonacci is best (as well as a fascinating piece on naturally occurring instances of the series).

I had to read this "proof" a couple of times before the logic underlying the mathematical gobbledygook and handwaving sunk in. Basically we get progressively less accurate as estimates grow larger, so it makes sense to chunk them more coarsely on the high end of the scale. The difference between 20 and 21 is entirely insignificant, but we might still have a feeling for whether something is more of a 20 or a 40 pointer.

But still, those larger estimates are clearly very inaccurate, raising the question of whether we should simply mandate that stories be split up if they exceed a certain upper threshold. The problem with this is that it often isn't feasible to split up stories until work has started on them. As a result, large stories are likely to languish unpointed for some time before we get to them, depriving us of much of the predictive power of velocity. But the comfort we get from all those 20 and 40 point stories in the backlog is probably a chimera, so perhaps we're better off knowing what we don't know, to paraphrase Donald Rumsfeld.

One final consideration is that pointing has value beyond predicting productivity. Perhaps equally important is the discussion that it fosters in the team, often revealing dramatic differences in our understanding of what a particularly story exactly means. A heated discussion about whether something should get 13, 20 or 40 points might yield a figure without much grounding in reality, but at least it forces us to think and talk about the story and the work involved.

I'm leaning towards adopting a Fibonacci or other progressive scale (the justification for Fibonacci specifically is that it approximates an exponential series based on the golden ratio: 1.6ish). However, it should be mandated in this case that larger stories be rediscussed and split before they are started or, if necessary, after some initial work has been done (since details often crystallize after a couple of days of coding). Whatever we choose, I doubt we'll stick with it forever. This is an area still ripe for experimentation and innovation, and I expect it will take a while before we find a system we consider optimal.