Question: Re Estimating with Story Points…

Dolores asks: Hi Joe – …I have always been taught that story points should never be compared to time measurements. However, we were using 1 story point = a day or a day and a half of work to estimate initial team velocity. One of the questions we often get about scrum is how do points relate to hours (or days). So, what’s the best answer we should give to our users who are new to scrum and trying to get their heads around story points? (seemed like a good blog topic to me)

Let’s start even earlier.  First, Scrum is not trying to track inputs, but rather results or outputs.  We want to maximize the results.

Next: When the Doers in the Team are using story points to estimate stories, they should not be thinking of time.  You are correct.  Try to get them (despite prior experiences) to think simple ‘relatively’.  If the Reference Story is 1 SP, then how big is Story X?  And they use the Fibonacci cards to indicate the relative size.  2x the Reference Story, or 5x the RS, or 21x the RS.

A bit later, all of the Product Backlog (for the next x months, let’s say 6 months..some shortish period) is estimated in story points.

Later still, we want to have a decent guess at when the top Y set of stories (eg, the first release) will be done.  And we have a brand new team that does not have a velocity.  In that case, we have to use that formula that we showed you.  And that formula includes a ratio: 1 Story Point = Z ideal person days.  I do not know a way to make the initial guess without that formula.

You are right to be concerned.  Yes, we do not want the estimators (the Doers) to know that ratio when they are voting.  If they do, they tend to go back to the ‘old ways’ of estimating the time to do work (usually in ideal hours or days) and then converting that back to story points.  Not good.

BUT…you (let us say you, as the ScrumMaster), need that ratio to make a decent guess (now) at what the team velocity will be.  So, I recommend that you and an outside person look at the RS (Reference Story) and maybe a reasonable sample set of the Product Backlog, and determine the typical ratio (1 SP = X ideal person days).  Do not tell the estimators (Doers), but use that ratio in the calculated velocity.

Soon, this problem goes away (we hope) because you can use the real velocity after a few sprints, rather than the calculation.

Note: It is normally better if Z is in the range of 1 to 2 ideal person days.  I won’t explain why here, just trust me on this.  It helps some.

One further note.  As we also mentioned, once you know certain information (including the Team’s real velocity), you can start to infer a relationship between 1 SP and ‘real’ time.  At least for a given team.  This is fine for managers and others to do if there are no ‘bad behaviors’.  I do not recommend talking about this with the estimators.  The research shows that people estimate time badly.  But people are fairly good at estimating relative size/complexity compared to a Reference Story.

Everything clear now?

Tagged:

4 thoughts on “Question: Re Estimating with Story Points…”

1. Joe Little Post author

Hi Cliff. Good question.
The usual advice for story points is, pick the smallest (effort) story in the Prod Bklog.
However, it needs to be ‘not too small’ and ‘not too big’, but ‘just right’.
Usually the best advice on this (the Fibonacci numbers work out more usefully if…): the RS (reference story) should be about 1 to 2 ideal person days of effort.
BUT: you do not want the Team to know that info (the time), because that makes them (often) think in terms of time as they do the estimating.
So, the SM and maybe one or two (outside) advisors should look at the proposed RS and decide whether or not it is ‘good enough’. In terms of size.
Another aspect of good enough is that it is also a reasonably ‘normal’ user story. If it is a weird story (in some way) it may not help the team as an RS.
So, if the proposed RS is too big, then break it down. If the proposed RS is too small, pick the next biggest one (usually it is about the right size).
Clear enough? Thx, Joe

2. Gary K Evans

I might add a slight nuance. First, equating a point to time is the classic example of trying to hold onto what is familiar from waterfall and twist agile estimation to fit that nostalgic impulse. Second, it is an empirical fact that some 3 point stories take longer to implement than a 5 point story, and some 8 point stories take less time than 5 point stories. This is because a story point estimate is a factor of multiple parameters: story complexity, analysis and design effort, uncertainty about criteria for satisfaction, calendar time, number of implementors or expertise needed, test effort, and so forth. To attempt to reverse engineer story points to time is a “lossy” conversion and quite inappropriate. Joe is absolutely correct that agile (should) focus on outcomes. Estimates are inputs. I strongly urge you to help your PMO and other management to understand they are asking the wrong question searching for a comfortable answer. Help them understand that if sprint 6 delivers 28 story points but sprint 7 delivers only 19 – what possible consistency can they derive? However, the change in SPs delivered can lead to asking the legitimate question of “why the difference? what causes led to this difference?”

3. Joe Little Post author

Hi others (and Gary),
I just want to emphasize Gary’s point about the variability of estimates. “To predict is difficult, particularly of the future.”
Now, with story points (rather than time) the distributions (statistically speaking) of actuals against the expected form a normal (Gaussian) distribution.
This is good.
This means that the Sigma (variation) can be used, for each estimate. For practical purposes, the Sigmas for the individual stories are quite large, as you said. (Although we can live and learn and over time make them usually somewhat smaller).

What is useful is: If you have a fairly large set of stories (say 8+) you then get an effect that the Sigma on the stories may be large, but the Sigma on the expected velocity is much smaller.

So, that, if the average velocity is 20 story points, and they ‘take in’ 20 story points, then the likelihood of hitting 20 story points becomes higher. If there are 8+ stories or so. (If there are only 2 or 4 stories, this effect is much less powerful; if the 8 stories are highly variable in size, the effect also is lessened. You want 8 stories that you think are all about the same size.)

So, the Team can/should start to have a fairly high reliability with 8+ stories per sprint. I would say being 70-80% reliable should be fairly normal pretty soon for most teams (there are many factors involved). By this we mean: If they had an average velocity over the last 3 sprints of 20 SPs, and they ‘commit’ to another 20 SPs, then in that sprint they should get at least 20 SPs or more 70 to 80% of the time.

And this relative predictability is useful in business. It is by no means absolute predictability….that is, each sprint is not ‘in the bank’ in terms of ‘commitment.’ So, do not blame a team if they do not ‘hit’ certain sprints. But do ask some hard questions if they cannot become ‘highly reliable’ in the range of 70-80%.

Now, for relatively new teams (especially), there can be a bunch of ‘normal’ reasons why they are not reliable (yet). And there are a bunch of special cases. But in general, scrum teams should be able to become ‘highly reliable’ fairly quickly. The next question becomes: given the nature of their work, how high is highly reliable? I suggested the 70-80% range is fairly typical, but might not be right for your type of work.

I also want to emphasize another things Gary said. Estimates are based on many things (many factors affect their accuracy). So, you may have to work on the factors he mentioned (and others) for awhile, if you want your team to become ‘highly reliable’.

I guess we must mention Mr Deming. Do not try to fix the wrong kind of variation the wrong way. As one example, do not worry very much at all, as a manager, about the variation on the individual story estimates. And, do not blame the team if they are not initially reliable at the sprint level 70-80% of the time. Look at the systemic things that are leading to ‘bad’ variation at the sprint level. Improve on those.

Finally, Gary reminds me to say this. If Team 1 is dependent on Team 2 and Team 3 (ie, what I call scaling) in delivering successfully in a Sprint, then estimates at the individual team/sprint level will tend to be more variable. Again, do not blame the Team. Look for other solutions.

Hope that helps…