You create a new software application. It grows, rapidly. And it keeps growing. You add tools. You add people. You add roles and structure. You split the codebase into different technical components. You divide teams. You add environments. You make rules for merging and deploying.
One day, you look round and realize you have 20 times the staff but deliver only a fraction of what you used to when you had only a handful of people. Demand is still growing, but you can't keep up.
On top of it, everybody is quarreling.
There are the old-timers, the people who were there at the beginning, who know the code backwards & forwards, who were there for the early triumphs. They're still the people who get called in when a deployment is in trouble, when there's a mystery problem, when it's already well past the 11th hour. Which, of course, means they're called in for every deployment. They want to carry on with the "trust me" ways. "Trust me" when I tell you that I'll get it done, just leave me to it. "Trust me" that if I have to, I'll go to heroic lengths to make my deadline. "Trust me" that we'll pull it off - we always have. This is how we've always done it. This is what works.
Then there are the people hired in the past year to run QA and create a PMO. They want control. "I want estimates, and task orders, and a project plan. I want specifications and approvals. I want to maximize the time programmers spend programming and testers spend testing. I want the cheapest resources I can get. I want test scripts. I want documentation. I want process." This is how we did it at my last firm. This is what works.
Then it happens again. Another botched deployment. Several days of downtime. Angry customers. Day after day of crisis calls and coordinated recovery efforts. Too much drama makes management nervous. We can't go on like this. This doesn't work.
But even management is divided. Some of the managers have been around since the early days: We operate in a fast & furious environment, this comes with the territory. Some of the managers are new: You can't run a business of this size by the seat of your pants. The rhetoric heat up and escalates. Neither side convinces the other. Disagreements become arguments become accusations. "Cowboys". "Control freaks". Impasse. Stalemate. Nothing changes.
The bickering continues, until the moment when the decision is made for everybody. Another deployment failure. This one in spectacular style: very public, and very severe, and very embarrassing, and very bad.Time for new leadership. Call a search firm. Hire some hotshot from a firm like the one we want to be, one that grew and scaled. Give him a mandate. Have him report directly to the President.The hired-in management hopes this new leader will bring deliverance from the chaos. The old-timers hope that "how he did it at his last firm" is "how we've always done it here".Whatever the case, it is now entirely out of their hands.
I've seen this same pattern in a lot of growth business, both captive tech and tech start-ups: its market is still growing, but operations have become sclerotic and performance erratic.
By the time it gets to this point - lots of people, little getting done, low confidence, open bickering - the overriding mission has already started to change from innovating in the business through technology to The Website Must Not Go Down.This mission change is driven from the top. Leadership feels the pain of operational failure more acutely as they come to see the business as an established competitor rather than a plucky start-up. Whether a start-up with investor money, or a captive IT project that has prominence within a large corporate, leaders are held to the expectation that operations will be predictable. This is how they are measured, so this is how they will manage.Caution rules the day. We deploy less often and with more ceremony. We err on the side of not making a mistake. The fear of a service interruption causes organizational seizure. The price of innovation is subconsciously deemed too high.We didn't used to be like this.Let's dissect the situation.
On the plus side, you still have a core of people who are fluent in the code because they were among the principal authors. You've lost some key people over the years, but you still have people with in-depth knowledge of the whats and the whys of the code. And, the hero culture means they are personally invested in success: this is their code, this is personal. You also have hired in new people - a lot of new people - some of whom can become fluent in the product, the customers, and the business over time.
Most everybody will have a job on the line, and many senior people are still "close to code". There won't be much time for luxury positions, people in jobs off the line focused on things like process and group standards.
Strange as it may seem, another plus is that you operate in a fast-paced business environment. This is counter-intuitive: the environment seems to be the source of the problem. But it is your greatest asset. The critical success criteria are not costs but growth. Riding that growth will depend on innovation and time-to-market more than predictability and control.
Then there are the minuses.
You are beholden to the knowledge of the people who form your core group, and that knowledge exists exclusively in their heads. All those people you added to scale your capacity have given you a bloated team; if costs aren't a concern now they will be soon. Worse, you're not getting value from those new hires. Many are net negative contributors. With a bit of time and attention, some can become net positive, but the rest - maybe 30%, maybe 80% - are just plain bad hires. This happens when people are hired for capacity as opposed to capability.
You've added new roles, new structure, and new formality, in an attempt to gain control. That's given you a more complex business. It also creates mission confusion: as much as people are trying their level best they're adding as much confusion and delay as clarity and certainty because the structure is at odds with results.
Your core team of "heroes" have had a lot of freedom and independence for a long time. They will generally resist any change you try to make that they perceive will curtail that freedom.
People may be close to code, but if you've split the code into technical silos, your people will be pretty far removed from how and why your customers use the software.
Many of these minuses are by-products of the responses to growth: hire more people, add structure, divide and conquer the problem. But the fundamental hero culture that is resistant to any change is a hold-over from the early, free-wheeling days.
If the business is still growing, the heroes should have a case for remaining aggressive and getting stuff done. But priorities change when business leaders think they've got something - market share, outside investors - to lose. And the credibility of the hero culture erodes with every production snafu, every minute of unscheduled downtime, every unforced error.
* * *
The cowboy approach puts stability at risk. Control will stifle growth. So what can you do? It seems you are forced to choose between "responsibly stable" and "recklessly aggressive".
You are not. You must unwind and rebuild.
Fundamentally, things in this organization still get done in an ad-hoc way. Layers of control, scale, and structure have been grafted onto this. They are a mismatch. We know this because all those people and processes are shunted aside when stuff absolutely needs to get done, when a new release absolutely needs to get deployed.
Unwind
Here are things we can do to unwind these layers.
Furlough the net negative people. Having net negative contributors isn't good for anybody: not for the business, not for the net positive contributors, not for the people who are net negative. Frustration and disappointment are lousy characteristic of operations and careers.
Institute a policy of "do no harm". Introduce greater rigor in how you work - in how you analyze, code, build and test. Publicly expose quality statistics. Every new line of code, every new requirement, every new test, should be performed to a higher level of simplicity, clarity, ease of maintenance. Agile is pretty good for this.
Practices aren't enough. You need to instill value systems and social norms that reinforce them. Agile is pretty good for this, too. If you haven't done so in a while, re-read the Agile manifesto. It offers a core set of values. And a policy of "do no harm" is a good starting point for a value system.
These things gives us a foundation of excellence. They reduce the noise and distractions, and makes quality central to, not a potential byproduct of, what we do.Rebuild
The unwinding work strips out the misfit layers and lets us improve our fundamentals. But that isn't enough. We also have some rebuilding to do.
Organize people by how your customers work with you, not how the technology works together. It doesn't make sense to organize a business orthogonally to the way it makes money, by product, or by customer-activity model.
Hire in new people who are not only smart and talented but specifically have the completion and collaboration genes in their DNA. Then, pair. Then rotate pairs. Then pair across discipline - BAs with Developers, Developers with QAs, UX with QAs. Do not underestimate how difficult it will be for people to pair. People will quit as a result of this. But this is a highly effective way to unwind knowledge silos. Unit tests are helpful, Stories with acceptance criteria are helpful, but nothing reduces proprietary knowledge as effectively as people actively working together on a specific solution.
Advance the state of your practice as far down the path of Continuous Delivery as you can. Make many small deliveries (Agile Stories are fantastic for enabling this). Commit code frequently. Build continuously. Automate everything. Deploy many times a day.
ResultsThis will leave you more resilient to downside risks (less susceptibility to catastrophic events) and able to capitalize on upside opportunities (quickly satisfy needs). Stability and innovation are no longer trade-offs.
You can start to rebuild while you are still unwinding. Just know why you are doing what you are doing. Reorganizing without behaviour change is just going to add layers and confusion. Similarly, introducing better practices will make your organization less bad, but won't make it world class.
Do not under-estimate the work and commitment this requires This is a major overhaul. This requires changing people and changing mind-sets, a lot of communication and reassurance, a suspension of disbelief that this will work, and tenacity during the trough of despair when it looks like things are getting worse, not better.Most of all, remember that the most critical acts of leadership are the repeated commitments not to a vision, but to the people responsible for executing it. They'll make mistakes. They'll make plenty of mistakes. Make it safe to make mistakes. Do not hold people to an expectation of perfection, but that they act, iterate, and learn.