The Agile Manager: 2025

I consult, write, and speak on running better technology businesses (tech firms and IT captives) and the things that make it possible: good governance behaviors (activist investing in IT), what matters most (results, not effort), how we organize (restructure from the technologically abstract to the business concrete), how we execute and manage (replacing industrial with professional), how we plan (debunking the myth of control), and how we pay the bills (capital-intensive financing and budgeting in an agile world). I am increasingly interested in robustness over optimization.

Monday, March 31, 2025

Tech services firms are aggressively applying AI in delivery. They aren’t ready for the consequences of cannibalizing their business model.

Technology services firms are going heavy on AI in delivery of services. This is motivated by need: services have been a tough market for a couple of years now, and AI is one of the few things every potential client is interested in. But it’s been difficult for services firms to get a lot of AI gigs. It’s a crowded field with not enough case studies to go around; this makes it difficult for potential customers to justify renting consulting labor when the starting point with their own staff is no different. Not to mention, most companies regard their AI investments as core intellectual property: they want the skills and knowledge that creates it indigenous to their payroll.

Tech services firms are instead applying AI to their own offerings to do things like accelerate the reverse-engineering of existing code and expedite forward engineering of new solutions. Those firms are developing proprietary products to do this (e.g., McKinsey last week posted a blog touting their proprietary AI platform for “rejuvenating legacy infrastructure”). The value prop is that by using AI tools, solution development takes less time and therefore costs less money and presents less risk.

This has ramifications for the consulting business model. The “big up front design” phase that lasts for months (if not quarters) is going to be a tough sell when the AI brought to bear is touted as a major time saver: in the minds of the buyers, either the machines speed things up or they don’t. But the real problem here isn’t just erosion of services revenue, but something far more elemental: bulge bracket consulting firms use that big up front design phase to train their employees in the basics of business on the customer’s dime. Not the customer’s business. The basics of business. A lot of workshops during that up front design time cover Principals of Financial Accounting I for the benefit of inexperienced staff.

(Before scuffawing at that notion, if you’ve ever worked in order management, think about the number of consultants who had no grasp of order-to-cash; who did not understand the relationship, let alone the difference, between sales orders and invoices; who did not understand the relationship of payments to invoices; who did not understand RMA. This is not “learning the customer’s business.” This is learning business. I could go on.)

And, of course, AI tools are accessible to anybody - not just to people in tech. This means that anybody can compose a prompt and get a response. To get value from the tools requires that the consumer be able to adjudicate what the tools produce. Even the most thoroughly articulated prompt is prone to yielding a GenAI response that is syntactically correct but does not actually work. That statement doesn’t apply just to code. Humans make solution design mistakes all the time; any synthetically produced response will require human validation. Adjudication requires expertise and first hand knowledge, and not just of the tech but of the problem space itself.

AI tools make verbs like solution definition, construction, and deployment less inaccessible to non-technology people. The more that progresses, the less valuable the knowledge in those areas becomes because it reduces knowledge asymmetry between tech person and non-tech person. At the same time, as long as AI generated responses must be verified and validated, there will be a premium on adjudication.

Please understand what I am not saying here. I am not saying AI has or is about to make those verbs highly accessible to non-technology people; I am saying AI - specifically, GenAI - has made those verbs less inaccessible to non-technology people. I am also not saying AI generated responses are instantly production ready; I am saying the opposite.

The net effect is a disruption of the traditional tech--business relationship. Tech labor is at a disadvantage from this disruption.

Giving AI tools to people who are unable to assess fitness for purpose of the output those tools produce will not increase labor productivity. A product manager who is not knowledgeable in the domain cannot exercise the one decision right a product manager is required to exercise: prioritization. Similarly, a developer or QA engineer who doesn’t understand the domain can only certify code as far as “nobody can tell me it isn’t right”. The more AI tools are used to produce something - requirements, technical architecture, code, tests - the more important the human evaluation of that output becomes. Anybody who cannot evaluate the fitness for purpose of what their AI tools produce will be reduced to being implementers and stage managers for those who can.

Stage management does not command a premium rate.

Tech services firms will eventually cannibalize the consulting business model with AI. Well, perhaps it’s more accurate that AI will eventually cannibalize the consulting business model of tech services firms. To avoid being caught out in a market shakeout, services firms have to do two things. One is to no longer rely on knowledge of technology and get much, much more in-depth in their knowledge of business, and in particular the knowledge of their customer’s business. The second is, armed with that knowledge, to define entirely new categories of technology-informed services to sell.

Tech services firms are in an AI arms race in a war for contracts. It’s not obvious they’re preparing to win the peace that follows.

Friday, February 28, 2025

American manufacturers have forgotten Deming's principles. Their software teams never learned them.

American manufacturing struggled with quality problems in the 1970s and 1980s. Manufacturers got their house in order with the help of W. Edwards Deming, applying statistical quality control and total quality management throughout the manufacturing process, from raw materials, to work in process, to finished goods. The quality of American made products improved significantly as a result.

Once again, American manufacturing is struggling with quality problems, from Seattle’s planes to Detroit’s cars. But the products are a little different this time. Not only are manufactured products physically more complex than they were 50 years ago, they consist of both physical and digital components. Making matters worse, quality processes have not kept pace with the increase in complexity. Product testing practices haven’t changed much in thirty years; software testing practices haven’t changed much in twenty.

And then, of course, there are “practices” and “what is practiced.”

The benefit of having specifications, whether for the threading of fasteners or scalability of a method, is that they enable components to be individually tested to see whether they meet expectations. The benefit of automating inspection is that it gives a continuous picture of the current state of quality of the things coming in, the things in-flight, and the things going out. Automated tests provide both of these things in software.

If tests are coded before their corresponding methods are coded, there is a good chance that the tests validate the expectation of the code and the code is constructed in such a way that the fulfillment of expectation is visible and quantifiable. Provided the outcome - the desired technical and functional behavior - is achieved, the code is within the expected tolerance, and the properties the code needs to satisfy can be confirmed.

All too often, tests are written after the fact, which leads to “best endeavors” testing of the software as constructed. Yes, those tests will catch some technical errors, particularly as the code changes over time, but (a) tests composed after the fact can only test for specific characteristics to the extent to which the code itself is testable; and (b) it relegates testing to an exactness of implementation (a standard of acceptance that physical products grew out of in the 19th century).

Another way to look at it is, code can satisfy tests written ex post facto, but all that indicates is the code still works in the way it was originally composed to the extent to which the code exposes the relevant properties. This is not the same as indicating how well the code does what it is expected to do. That’s a pretty big gap in test fidelity.

It’s also a pretty big indicator that measurement of quality is valued over executing with quality.

Quality in practice goes downhill quickly from here. Tests that do nothing except to increase the number of automated tests. Tests with inherent dependencies that produce Type 1 and Type 2 errors. Tests skipped or ignored. It’s just testing theater when it is the tests rather than the code being exercised, as in, “the tests passed (or failed)” rather than “the code passed (or failed)”.

Automated testing is used as a proxy for the presence of quality. But a large number of automated tests is not an indicator of a culture of quality if what was coded and not why it was coded is what gets exercised in tests. When it is the prior, there will always be a large, late-stage, labor intensive QA process to see whether or not the software does what it is supposed to do, in a vain attempt to inspect in just enough quality.

Automated test assets are treated as a measurable indicator of quality when they should be treated as evidence that quality is built in. Software quality will never level up until it figures this out.

Friday, January 31, 2025

We knew what didn’t work in software development a generation ago. They’re still common practices today.

In the not too distant past, software development was notorious for taking too long, costing too much, and having a high rate of cancellation. Companies spent months doing up-front analysis and design before a line of code was written. The entire solution was coded before being deemed ready to test. Users only got a glimpse of the software shortly before the go-live date. No surprise that there were a lot of surprises, all of them unpleasant: a shadow “component integration testing” phase, an extraordinarily lengthy testing phase, and extensive changes to overcome user rejection. By making the activities of software development continuous rather than discrete, collaborative rather than linear, and automated rather than repeated meant more transparency, less time wasted, and less time taken to create useful solutions through code.

It was pretty obvious back then what didn’t work, but it wasn’t so obvious what would. Gradually people throughout the industry figured out what worked less bad and, eventually, what worked better. Just like it says in the first line of the Agile manifesto. But I do sometimes wonder how much of it really stuck when I see software delivery that looks a lot like it did in the bad old days. Such as when:

"Big up front design" is replaced with … "big up front design": pre-development phases that rival in duration and cost those of yesteryear.
It takes years before we get useful software. The promise during the pitch presentation was that there would be frequent releases of valuable software… but flip down a few slides and you’ll see that only happens once the foundation is built. It’s duplicitous to talk about “frequent releases” when it takes a long time and 8 figures of spend to get the foundation - sorry, the platform - built.
We have just as little transparency as we did in the waterfall days. Back then, we knew how much we’d spent but not what we actually had to show for it: requirements were “complete” but inadequate for coding; software was coded but it didn’t integrate; code was complete but was laden with defects; QA was complete but the software wasn’t usable. The project tasks might have been done, but the work was not. Today, when work is defined as tasks assigned to multiple developers we have the same problem, because the tasks might very well be complete but "done" - satisfying a requirement - takes more than the sum of the effort to complete each task. Just as back then, we have work granularity mismatches, shadow phases, deferred testing, and rework cycles. Once again, we know how much we’re spending, but have no visibility into the efficacy of that spend.
The team is continuously learning … what they were reasonably expected to have known before they started down this path in the first place. Specifically, that what the team is incrementally "learning" is only new to people in the development team. Significant code refactoring that results from the team "discovering" something we already knew is a non-value-generative rework cycle.
We have labor-intensive late stage testing despite there being hundreds and hundreds of automated tests, because tests are constructed poorly (e.g., functional tests being passed off as unit tests), bloated, flaky, inert, disabled, and / or ignored. Rather than fixing the problems with the tests and the code, we push QA to the right and staff more people.
Deadlines are important but not taken seriously. During the waterfall days, projects slipped deadlines all the time, and management kept shoveling more money at it. Product thinking has turned software development into a perpetual investment, and development teams into a utility service. Utility service priorities are self-referential to the service they provide, not the people who consume the service. The standard of timeliness is therefore "you’ll get it when you get it."
Responsibility isn’t shared but orphaned. Vague role definitions, a dearth of domain knowledge, unclear decision rights, a preference for sourcing capacity (people hours) and specialist labor mean process doesn’t empower people as much as process gives people plausible deniability when something bad happens.

Companies and teams will traffic in process as a way to signal something, or several somethings: because of our process we deliver value, our quality is very high, our team is cost effective, we get product to market quickly. If in practice they do the things we learned specifically not to do a generation ago, they are none of the things they claim to be.