A big software development project collapses just weeks before it is expected to be released to QA. According to the project leadership team, they're dogged by integration problems and as a result, the software is nowhere close to being done. It's going to take far more time and far more money than expected, but until these integration problems get sorted out, it isn't clear how much more time and how much more money.
The executive responsible for it is going to get directly involved. He has a background in development and he knows many of the people and suppliers on the team, but he doesn't really know what they're doing or how they're going about doing it.
The team situation is complicated. There are the consultants from a product company, consultants from two other outsourcing firms, several independent contractors, plus a few of our own employees. QA is outsourced to a specialist firm, and used as a shared service. All these people are spread out across the globe, and even where people are in the same city they may work out on different floors, or in different buildings, or simply work from home. Teams are organized by technology (e.g., services developers) or activity (analysts, developers, QA). Project status data is fragmented: we have a project plan, organized by use cases we want to assemble and release for staged QA. We have the developer tasks that we're tracking. We have the QA scripts that need to be written and executed. We have test data that we need to source for both developers and QA. And we have a defect list. Lots of people, lots of places, lots of work, lots of tracking, but not a lot to show for it.
The executive's first action will be to ask each sub-team to provide more status reports more frequently, and report on the finest details of what people are doing and how long it will be before they're done. New task tracking, daily (perhaps twice daily) status updates, weekly status updates to stakeholders to show progress.
The common reaction to every failure in financial markets has been to demand more disclosure and greater transparency. And, viewed in the abstract, who could dispute the merits of disclosure and transparency? You can never have too much information.
But you can.
So wrote John Kay in the Financial Times recently. His words are applicable to IT as well.
Gathering more data more frequently about an existing system merely serves to tell us what we already know. A lot of people are coding, but nothing is getting done because there are many dependencies in the code and people are working on inter-dependent parts at different times. A lot of use cases are waiting to be tested but lack data to test them. A lot of functional tests have been executed but they're neither passed nor failed because the people executing the tests have questions about them. The defect backlog is steadily growing in all directions, reflecting problems with the software, with the environments, with the data, with the requirements, or just mysterious outcomes nobody has the time to fully research. When a project collapses, it isn't because of a project data problem: all of these things are in plain sight.
If getting more data more frequently isn't going to do any good, why do the arriving rescuers always ask for it? Because they hope that the breakthrough lies with adjusting and fine tuning of the existing team and organization. Changing an existing system is a lot less work - not to mention a lot more politically palatable and a lot less risky - than overhauling a system.
There are costs to providing information, which is why these obligations have proved unpopular with companies. There are also costs entailed in processing information – even if the only processing you undertake is to recognise that you want to discard it.
More reporting more often adds burden to our project managers, who must now spend more time in meetings and cobbling together more reports. Instead of having the people close to the situation look for ways to make things better, the people close to the situation are generating reports in the hope that people removed from the situation will make it better. It yields no constructive insight into the problems at hand. It simply reinforces the obvious and leads to management edicts that we need to "reduce the number of defects" and "get more tests to pass."
This reduces line leaders into messengers between the team (status) and the executive (demands). As decision making authority becomes concentrated in fewer hands, project leadership relies less on feedback than brute force.
[M]ost of the rustles in the undergrowth were the wind rather than the meat.
Excessive data can lead to misguided action, false optimizations and unintended outcomes. Suppose the executive bangs the drum about too many defects being unresolved for too long a period of time. The easiest thing for people to do isn't to try to fix the defects, but to deflect responsibility for them, which they can do by reassigning those in their queue to somebody else. Some years ago, I was asked by a client to assess a distressed project that among other things had over 1,000 critical and high priority defects. It came as no surprise to learn that every last one of them was assigned to a person outside the core project team. The public hand wringing about defects resulted in behaviours that deferred, rather than accelerated, things getting fixed.
The underlying profitability of most financial activities can be judged only over a complete business cycle – or longer. The damage done by presenting spurious profit figures, derived by marking assets to delusionary market values or computed from hypothetical model-based valuations, has been literally incalculable.
Traditional IT project plans are models. Unfortunately, we put tremendous faith in our models. Models are limited, and frequently flawed. Financial institutions placed faith in Value at Risk models, which intentionally excluded low probability but high impact events, to their (and to the world's) detriment. Our IT models are similarly limited. Most IT project plans don't actively include the impact analysis of reasonably probable things like the loss of key people, of mistakes in requirements, or business priority changes.
In traditional IT, work is separated into different phases of activity: everything gets analyzed, then everything gets coded, then everything gets tested, then it gets deployed. And only then do we find out if everything worked or not. It takes us a long, long time - and no small amount of effort - to get any kind of results across the finish line. That, in turn, increases the likelihood of disaster. Because effort makes a poor proxy for results, interim progress reports are the equivalent of marking to model. Asking project managers for more status data more frequently burns the candle from the wrong end: reducing the project status data cycle does nothing if we don't shorten the results cycle.
It is time companies and their investors got together to identify information [...] relevant to their joint needs.
We put tremendous faith in our project plans. Conventional thinking is that if every resource on this project performs their tasks, the project will be delivered on time. The going-in assumption of a rescue is that we have a deficiency in execution, not in organization. But if we are faced with the need to rescue a troubled project, we must see things through the lens of results and not effort. Nobody sets out to buy effort from business analysts, programmers and QA analysts. They do set out to buy software that requires participation by those people. This is ultimately how any investment is measured. This and this alone - not effort, not tasks - must be the coin of the realm.
As projects fail for systemic more than execution reasons, project rescues call for systemic change. In the case of rescuing a traditionally managed IT project, this means reorganizing away from skill silos into teams concentrating on delivering business-specific needs, working on small business requirements as opposed to technology tasks, triggering a build with each code commit and immediately deploying it to an environment where it is tested. If we do these things, we don't need loads of data or hyper-frequent updates to tell us whether we're getting stuff done or not. Day in and day out, either we get new software with new features that we can run, or we don't.
It is irrational to organize a software business such that it optimizes effort but produces results infrequently. Sadly, this happens in IT all the time. If we believe we can't deliver meaningful software in a short time frame - measured in days - we've incorrectly defined the problem space. Anything longer than that is a leap of faith. All the status reports in the world won't make it anything but.