How to be happier and faster after a scale down

Elli Engineering
9 min readFeb 5, 2024

--

Illustration by Lukas Hanke

The core message of this blog post is about slicing your work into small increments. Why do I write a blog post about probably the oldest topic in lean development? There is plenty of published literature proving the benefits of smaller batches and I have never heard someone actively arguing for large batch sizes. As a result of this common knowledge, one would expect that working in small increments happens automatically, because everybody believes in it and should strive towards it. However, this was not always the case for our team.

When reading this blog post I hope you can take something away on:

  1. Why we had large stories in the first place and what we did to change that.
  2. Which benefits we got out of slicing our user stories small, apart from the classic “small increments, fast feedback, less waste”.
  3. What happens when you overdo it with the slicing, which happened during our transition from big to small slices.

For this, I compare a period of several months of two subsequent years, to include the same seasonal capacity fluctuations due to vacations, in both periods. Between these periods our team scaled down by 25% and at the same time took up more responsibilities. The data sample for the “slicing small” period contains over 73 stories while the “slicing big” sample contains 43, which is a result of looking at the same amount of time but having the stories scoped differently.

The change to smaller user stories

Our team grappled with large user stories for a long time. The main reason for the large user stories was that we wanted to keep the scope of our user stories at a level where the completion of a story would deliver value to the user. Therefore, our efforts to slice stories into what we deemed the smallest units resulted in often larger than ideal Stories. This led to planning difficulties, high work-in-progress (WIP) and spillage across sprints. The result of that was dissatisfaction and persistent cognitive strain. Despite attempts to address these issues with retrospectives, the situation did not improve significantly. It needed a radical change.

From now on features are the high-level definitions of our product that should deliver obvious user value, while stories represent smaller pieces in the process of delivering this user value.

This made it possible to slice stories to a tiny level, a level that we previously rejected because it had no immediate user value. It is very important to note that putting the extra effort into slicing our batches small did not result in additional meeting slots for refinement and planning. We just asked ourselves more often if we could slice the parts even smaller.

Smaller team, same throughput, less mental strain

Illustration by Lukas Hanke

Let’s look at a couple of flow metrics and connect them to the perception that team members reported in retros and 1on1s. The following bar chart compares the big-batch-phase as the green baseline with the phase after we started slicing smaller in the purple bars. In the next paragraphs I describe the meaning of the bars from left to right.

The average cycle time of user stories went down by 60% from 27 to 11 working days. The change in cycle time is mainly driven by us doing less on a story. So we do less in less time. That is not impressive at all. It simply shows that we actually do what we intended to do, slice smaller.

Measuring code changes per developer

For throughput and WIP I use the changes in the code that we made per developer. “Per developer” means that the data is normalised by the team size. This is important to make data comparable before and after the scale down.

Code changes are a difficult metric, I am well aware of that. Of course, we did not do exactly the same changes, so we have to take it with a grain of salt. In this case I believe it is fine to use it as we worked in similar code bases on similar topics. It would be best to compare delivered business value but we have no single metric to directly compare between our “big batch” and our “small batch” phase.

Higher throughput

Our average code change throughput per week relative to the team size increased by over 27%. This does not account for the additional benefit of avoiding waste and re-work as a classic result of the fast feedback cycle and adapting what actually is built. I attribute this gain in throughput to more focus and clarity by using smaller stories. We also delivered an important feature on time, which boosted morale, as we were initially unsure if we could make it.

Reduction of WIP

Being faster is great but being faster and with less strain is even better. Therefore, the two right bars show the average code changes that are “active” on any given day relative to the team size. This means on the story-level, all the changes that were necessary to complete this story are counted for all the days that this story was in progress. This shall illustrate the size and complexity of the story that we have to keep in our heads while working on it.

On a story-level, the average WIP went down by over 60%. The story-level is especially important because it was one of the points that caused high cognitive load and confusion in daily work. With the smaller stories, we complete them faster and get them out of our heads, moving on to the next one. The smaller stories are usually also better defined, as there is less uncertainty in smaller stories.

Looking at the task level, the WIP dropped by over 20%. There are two factors here:

  1. Slicing the stories smaller also led to an effort of slicing tasks smaller.
  2. We put an additional focus on the “stop starting, start finishing” principle in our everyday work, which is not directly related to slicing our work smaller. Therefore, 20% could be largely unrelated to “slicing small” but rather a “finish first” result.

Something that I learned from Donald Reinertsen’s “The Principles of Product Development Flow” and from the resistance of the developers was to not have a fixed limit of WIP. Limiting WIP to a certain number of active stories was something I had firmly believed in. However, I saw that having story cycle time and throughput as additional metrics, we could actually control WIP. At times we could increase it and keep story cycle time constant but increase our throughput compared to a fixed WIP limit. If cycle time started increasing, it was a sign to reduce WIP again.

Less risk for outliers means better plan-ability

We have already seen the increased throughput in the previous blog post but working in smaller batches also reduces the overall uncertainty in the work, something that Donald Reinertsen points out very well in “The Principles of Product Development Flow” (page 95 ff).

Cutting down the unexpected story explosions

The following graph shows that the distribution of story cycle time changed drastically after our change in slicing. Green is the phase of initial big batches and purple current phase of slicing smaller. You can clearly see the shift of the distribution towards smaller cycle times and the reduction of stories that explode, having cycle times of several weeks or even months. You can also clearly see that we are still far from perfect with the occasional explosion happening.

But the most important part is that we reduce the tail of the distribution and the risk for outliers.

The effect of cutting the distribution tail on a year’s quarter of work

Now what happens when you need to do a certain amount of changes in your system and you do them either in small stories or in large stories?

To illustrate this, I took the cycle time and the amount of changes for each story in the two phases. As the baseline I chose the amount of changes that we achieved in one quarter in our “slicing big” phase. Then I created 5000 scenarios by taking random samples of stories that had the combined total of changes equivalent to this baseline. There are three cases to compare:

  • Big Stories, with 27 days per user story on average.
  • Small Stories, with 11 days per user story on average.
  • Our new target of having less than 6 days per user story on average.

The target case is created by taking our current slicing, the “Small Stories” case, and reducing the cycle time and the amount of changes for each story by half.

The bar plot shows the 95th percentile of working days, required for the changes of one quarter. The purple line represents 90 days, which we would expect these changes to take.

What happens when we slice the same work into smaller increments? We reduce the outliers to the top (exploding stories) and with that increase our predictability significantly. This is important for several reasons:

  • As a team we gain confidence in our ability to deliver value towards our customers and stakeholders in time, reducing the frustration of stories blowing up.
  • Our stakeholders see us as more reliable partners and we can manage their expectations better.
  • Often uncertainty is covered by buffers. This is a problem because it increases cost estimates and feedback cycles unnecessarily. Reducing the uncertainty reduces the need for buffers.

These benefits are not as quickly visible as a reduction of WIP or a reduction in cycle time. They are a result of working in smaller increments over a sustained period of time and staying in this working mode takes dedication.

Can we slice too small? — Yes we can!

This transformation did not happen without hick-ups. Especially I wanted to really swing our way of working from one extreme to the other and pushed towards such small stories that we ended up with very small parts and the following issues:

  • Interdependent Micro-Stories: The overly small slices resulted in stories that were highly interdependent, resulting in effort to track numerous tiny stories and their progress.
  • Transactional Inefficiency: Some stories were so tiny that the time spent on refining, discussing dependencies, and planning exceeded the actual work involved (Essentially the same issue as with async pull-requests described here).
  • Unintended Duplication: Working on a small story sometimes went so smoothly that additional work was done as a drive-by. This resulted in confusion as completed work overlapped with items still present in the backlog, causing unnecessary redundancy.

However, these initial problems lasted only two weeks. We addressed them together by re-planning and merging some of the stories.

Conclusion

Going back to what I wanted you to take away from this:

  • Why we had large stories in the first place and what we did to change that.

We did not further improve because we stuck to a convention of what our smallest increment should deliver, which was actually limiting us more than helping us. Questioning that made our improvements possible.

  • Which benefits we got out of slicing our user stories small apart from the classic “small increments, fast feedback, less waste”.

We managed to compensate for the 25% scale down by increasing throughput per developer by over 25%. Even more importantly, we have a massive reduction of what we need to keep in our heads every day, as small stories close faster, reducing story-level WIP. Our discussions are more focused and there is more alignment in the team on what we do. This part is to me by far the most important gain. Further, we reduced uncertainty in our delivery, which increases our confidence to deliver what we plan and the confidence of our stakeholders in us.

  • What happens when you overdo it with the slicing, which happened during our transition.

Early on we ran into unwanted dependencies between tiny stories. Also, we got into the transactional cost vs. batch size trap by making discussion time in refinements and plannings longer than actual work to be done. Fortunately, this was only a short transition phase.

Overall, this is not the perfect experiment. We just cannot perfectly control the team situation, fix all variables and only adjust the story size. However, I feel this is as close as I could get for a significant period of time. Let me know what you think and if you have done similar experiments.

About the author

Matthias Förth is a Product Owner and former Software Engineer focused on backend development. His current interests are the streamlining of development and pushing data-driven decision making at Elli.

--

--