Story Points Explained
TL;DR — Key takeaways
- Story points are a relative measure of the effort, complexity, and uncertainty in a piece of work — deliberately not hours or days.
- Relative estimation works because humans are poor at absolute estimation but good at comparison ("this is about twice that").
- You assign points by comparing each story to a reference story the whole team knows, usually via planning poker.
- Velocity — points completed per sprint — turns fuzzy relative estimates into usable forecasts after three to five sprints.
- The fastest way to ruin story points is to convert them back into hours or compare velocity between teams.
Story points are a unit agile teams use to estimate work by its relative size rather than its duration: a measure of effort, complexity, and uncertainty combined. A story worth 4 points is roughly twice the work of a 2-point story — and deliberately says nothing about how many hours either will take.
That indirection confuses everyone at first. This guide explains what points measure, why the indirection is worth it, how to assign them in practice, and the mistakes that quietly break them.
What are story points?
A story point is a relative unit: it expresses how big a piece of work is compared to other work the same team has done, folding together three things that make work big:
- Effort — the sheer volume of work involved.
- Complexity — how intellectually hard it is; how many moving parts interact.
- Uncertainty — how much is unknown: undocumented systems, new technology, vague requirements.
A useful analogy: you probably can't say how many hours it takes to move house — it depends on packing speed, stairs, how the day unfolds. But you can say with confidence that moving a three-bedroom house is about four times the job of moving a studio apartment. That comparison is a story point estimate: the studio is a 2, the house is an 8, and no hours were harmed in the making of either number.
Story points vs hours
Story points beat hour estimates for planning because they measure something teams can actually judge consistently — relative size — instead of something they demonstrably can't, which is absolute duration. Three reasons the indirection wins:
Humans fail at absolute estimation. Decades of software estimation research document chronic, resistant optimism in time estimates — Douglas Hofstadter enshrined it as Hofstadter's Law ("it always takes longer than you expect, even when you take into account Hofstadter's Law"), and Steve McConnell's Software Estimation: Demystifying the Black Art catalogs the evidence. Comparison judgments ("A is about twice B") are far more stable, and points only ask for comparisons.
Points absorb individual speed differences. A senior and a junior developer will never agree on how many hours a task takes, because it genuinely differs. They can agree it's "about a 5" — the same relative size regardless of who picks it up. That's what makes a team-level estimate possible at all.
Hours get treated as promises. An "8 hours" estimate becomes a deadline the moment a manager hears it; when reality intrudes, the team gets a variance conversation. "5 points" resists that conversion — duration emerges statistically from velocity instead of being promised story by story.
How to assign story points
Assigning points is a comparison exercise: pick a reference story, then size everything against it. In practice:
- Choose a reference story. Pick a small, recently completed story the whole team understands — say, "add validation to the signup form" — and call it a 2.
- Compare, don't measure. For each new story ask one question: is this bigger, smaller, or about the same as the reference? Roughly double? Then it's a 5 (on a Fibonacci scale, the nearest allowed value to 4). Four times? Call it an 8.
- Estimate as a team with planning poker. Everyone votes privately and reveals simultaneously, so the loudest voice doesn't set the number — the mechanics and rationale are covered in our planning poker guide. Our free tool handles the hiding and revealing.
- Re-anchor occasionally. Every few months, check that a "2" still means what it meant. Teams drift; a shared reference story is cheap recalibration.
A worked example: your reference "signup form validation" story is a 2. The next story is "add OAuth login with Google and GitHub." It's clearly more work (two providers, token handling), more complex (redirect flows), and less certain (nobody has touched the auth code in a year). Consensus lands on 8 — four times the reference — and nobody had to pretend to know how many hours the auth code will fight back.
Common story point scales
Most teams don't allow arbitrary numbers — they estimate on a fixed scale whose gaps widen as sizes grow, because uncertainty grows with size:
| Scale | Values | Character |
|---|---|---|
| Fibonacci | 1, 2, 3, 5, 8, 13, 21 | The classic; gaps mirror uncertainty |
| Modified Fibonacci | 0, ½, 1, 2, 3, 5, 8, 13, 20, 40, 100 | The de facto standard deck |
| T-shirt sizes | XS, S, M, L, XL | Non-numeric; roadmap-level sizing |
| Powers of two | 1, 2, 4, 8, 16 | Strict doubling semantics |
The trade-offs between them — and a decision guide — are in Fibonacci vs t-shirt sizing.
Velocity: where points pay off
Velocity is the number of story points a team completes per sprint, and it's the mechanism that turns relative estimates into calendar forecasts. If the last five sprints delivered 28, 31, 25, 30, and 29 points, the team's velocity is roughly 29 — and a 120-point epic is honestly "about four sprints."
Three properties make velocity work:
- It's empirical. Velocity is measured from what actually shipped, not promised from what was hoped. Optimism in individual estimates cancels out as long as it's consistent optimism.
- It stabilizes in 3–5 sprints. New teams' velocity bounces around at first; give it a month or two before forecasting with it.
- It's self-correcting. If the team starts under-estimating, velocity numbers rise to compensate, and forecasts stay honest. The estimate-to-reality exchange rate floats.
Common mistakes with story points
Every failure mode of story points is some version of the same error: treating them as absolute rather than relative.
- Converting points to hours. Publishing "1 point = 6 hours" reintroduces every problem points were designed to avoid, one spreadsheet at a time. If stakeholders need dates, give them velocity-based forecasts.
- Comparing velocity across teams. Points are team-local by construction — Team A's 5 and Team B's 5 share a numeral and nothing else. Cross-team velocity comparison measures who inflates estimates, not who works faster.
- Rewarding high velocity. Goodhart's Law in action: when velocity becomes a target, estimates inflate to meet it, and the measure stops measuring anything. Velocity is a planning input, never a KPI.
- Re-estimating finished stories. "That 5 turned out to be an 8" — resist. Estimates encode what the team knew at estimation time, and retroactive edits corrupt the velocity data that makes forecasting work. Learn the lesson; leave the number.
- Pointing everything to death. Estimating a two-line fix with a full poker round costs more than the fix. Many teams batch trivial work as ½- or 1-point items and save the ceremony for real stories.
Frequently asked questions
Are story points required in Scrum?
No. The Scrum Guide does not mention story points at all — it requires only that the team sizes its work somehow. Story points are a popular convention layered on top of Scrum, and teams increasingly experiment with alternatives like counting stories or #NoEstimates. Points persist because velocity-based forecasting is genuinely useful.
What's a good velocity?
There is no good or bad velocity — the number is meaningless outside the team that produced it. A team with a velocity of 25 is not slower than one at 60; they just calibrated their points differently. The only healthy velocity question is whether your velocity is stable enough sprint-to-sprint to forecast with.
Should we estimate bugs and tech debt?
Estimate them if your team needs the capacity they consume to be visible in planning — that's the pragmatic answer. Some teams point everything so that velocity reflects total work; others leave bugs unpointed and accept a lower story velocity as the cost of quality. Both work; what fails is switching approaches mid-quarter and comparing the numbers.
Can one person just assign the points alone?
A tech lead pointing stories solo is fast, but it discards the main benefit of estimation: surfacing the disagreements. When one developer says 2 and another says 8, the conversation that follows catches the missing dependency before the sprint starts. Group estimation via planning poker is slower per story and much cheaper per surprise.