What is the method to identify bottlenecks in a software engineering programme?

Question

I'm on a software development programme in financial services - with 100 developers, plus testers, BAs, PMs and other support staff.

We've read through Implementing Lean Software Development, and The Phoenix project, which both talk about identifying the bottleneck in your flow, and optimising it away. (Some similarities to critical path in a project).

Intuitively, we can identify the bottlenecks as, number of testing environments, amount of time and effort required for regression testing, size of the monolith, number of developers and so on. What we're trying to do is boil it down to the one bottleneck that holds everything else up. (Like a manufacturing process flow).

Applying Lean Software Development talks about value stream analysis - but doesn't quite go far enough to identify the one blocker that is critical to the whole system.

My question is: What is the method to identify the key bottleneck in a software engineering programme?

EDIT: Additional Assumptions:

In my environment - funding is allocated for large chunks of scope to be delivered at a specific date. In essence, the quality, scope and time is locked in up front. (With some variances for scope and time if absolutely required).
This means there lacks a concept of 'small pieces moving through the system'. There are only large projects with lots of stories (60+ stories - each with 10 days of work in them).
This is a somewhat waterfall-like environment (as much as Sarbanes Oxley dictates) with a separate System Integration Test and User Acceptance Test phase.

Why do you assume there is only one piece of this very large system that is a bigger bottleneck than all the rest? If so, isn't it possible the solution may involve how several other parts interact with it? Worse yet, it could be the piece that is the most difficult to fix and you could make just as many gains in the over-all system with small fixes in several other areas. — JeffO, Nov 26 '16 at 11:34
I'm using that in the context of 'The Goal' or 'The Phoenix Project' - which both look at Software Delivery in terms of a manufacturing pipeline flow. In this everything passes through the one set of steps, and one bottleneck impacts the whole flow. — hawkeye, Nov 26 '16 at 11:37
The Phoenix had an obvious bottleneck because one individual was doing too much and was a somewhat easy fix. The solution required fixing several other parts of the workflow like automating their build process, requiring more documentation, sharing and communicating with other teams. Focusing too much on fixing the one person would have limited their results. — JeffO, Dec 06 '16 at 00:42

score 7 · Accepted Answer · answered Nov 26 '16 at 10:09

One method to identify the most important bottlenecks is to make it visible what stages the work items go through.

As a start, try to follow a couple of work items (new features, bugs, improvements, etc.) through the complete cycle from the item becoming known to the team until the point where it has been successfully deployed into production. Write down what steps need to be taken to go through the complete path to production and where the ticket might get placed on a pile waiting for someone else to continue work on it or waiting for some other reason.

This can all be made visible by using a kanban board. In its most simple form, a kanban board consists of a number of columns representing the work-stages and wait-states in the development process and sticky notes for the work items.
Each sticky note gets moved across the board according to where it is in the development process or what it is waiting for.

Using a kanban board, you can identify bottlenecks by seeing tickets pile up in a column or by seeing that tickets get pulled out of a column faster than that new tickets come in.

If a waiting column fills up faster than that tickets get removed, that is an indicator that the resources that the tickets are waiting for are overloaded.
If a "doing work" column contains significantly more tickets than team members that can work on them, that is an indication that the team is working on too many things at the same time (leading to inefficiency due to context switching) or that a waiting state was missed.
If a waiting column regularly runs completely empty (tickets go out faster than they come in), then that is an indication that the team pulling those tickets is under-utilized and/or over-staffed.

The key bottleneck is the column where these effects are most strongly visible.

Thanks Bart - that is vintage Lean - and an awesome answer. It does assume a large number of small pieces of work - rather than large chunks of functionality locked in up front. — hawkeye, Nov 26 '16 at 10:11
@hawkeye if you only deliver once during the life of the project, you may have a lot of trouble implementing lean concepts. It's a very interesting question because of that though. — RubberDuck, Nov 26 '16 at 11:31
@hawkeye: If you have larger pieces of work, then the principle is exactly the same, but the work stages take longer to complete. And that might show up as a bottleneck in its own. — Bart van Ingen Schenau, Nov 26 '16 at 13:52
Thanks Bart - I think that's my issue. I'll find a way to ask that as a separate question. — hawkeye, Nov 26 '16 at 21:17

score 1 · Answer 2 · edited Apr 13 '17 at 12:45

I think that Bart van Ingen Schenau's answer is very good. It's essentially a real-time value stream map. However, I do have some other suggestions that may help you on top of that answer.

First, consider tracking the time in each state for a task. Tools should be able to provide this. If your tool doesn't, or you are using a physical board, you can write the date of transition on each card. This will allow you to get the average time and identify phases that take a long amount of time. However, something else that you need to capture is the time waiting versus time active. Again, making notes on the card can help with this and at the end of a cycle, you can put the times from the cards into something to analyze.

Second, consider the size of the activities. If you are using a Kanban board like what Bart suggested, you may want to consider having smaller grained columns or creating value stream maps for what happens inside a column.

Once you know your times, optimize the longest times first. First, try to reduce "waste" time, or the time that isn't in an active state. Second, look at trying to reduce the time in process.

What is the method to identify bottlenecks in a software engineering programme?

2 Answers2