How to use ephemeral teams responsibly
Ephemeral teams are teams that form for a short time around a particular problem or project. They disband once the problem is solved, or the project is complete. While there are many benefits to the concept, there are some dangers as well. Let’s talk about what those benefits are, what the very real but often overlooked dangers are, and then talk about how we can take advantage of the ephemeral teams concept without running into the dangers.
The benefits
Let me start by saying: there are many benefits to an ephemeral team.
For one, they make a way for individuals to work on the projects that motivate them (shoutout to the Agile manifesto!). This can be a huge boon to developer happiness and productivity. With a more static team structure, employees can be stuck with the work that falls into their domain whether it excites them or not. In my own experience, developers that are hired into a team that supports growth or marketing often feel like the work they’re doing isn’t that meaningful. They joined the company because they wanted to work on the core product, not because they wanted to optimize ads and landing pages. (Let me interject quickly that this is often a failure on the business side to explain the role. These business problems are absolutely fascinating and can be very fun to solve, but this isn’t immediately obvious to most engineers). But with the ability to move from project to project, engineers can spend some time doing the “necessary” work of optimizing landing pages, and some time doing the “meaningful” work of contributing to the core product.
In addition to helping engineers, ephemeral teams can also help the business move faster. This is because they can help fix capacity mismatches and avoid bottlenecks. For instance, if one team has too many backend engineers but not enough iOS engineers for a particular project, an ephemeral team could be formed with the necessary skillsets to prevent bottlenecks. Meanwhile, the extra backend engineers could slide to another ephemeral team to lend their capacity to a different project. By matching capacity to projects the company can theoretically handle more work in parallel than it would have been able to with static teams.
The dangers
While ephemeral teams offer benefits for employees and for the company, they also pose serious (and seriously overlooked) dangers as well. By taking stock of the risks, we can figure out a way to get the benefits while avoiding the dangers.
Costs of chemistry
The first danger is that teams take a long time to build chemistry together, and until they do, performance tends to suffer. Tuckman’s “Forming-Storming-Norming-Performing” is a generally accepted model of how this works (check out the Wikipedia page). Basically, teams come together (forming), tend to experience a little tension or conflict as they learn how to work together and build trust (storming), they emerge with acceptance for one another and a shared team identity (norming), then finally they reach a place of autonomy with the ability to make decisions effectively as a team (performing). A big difference between norming and performing is that team members in the norming phase often feel unable to disagree or dissent. There is a fear of causing conflict after just exiting the storming phase. And in fact, some teams never leave the storming phase if they can’t figure out how to get along! To make effective decisions, team members have to be able to disagree. But it takes a lot of time and energy for a team to reach that place, and in some cases teams never reach it.
As a result, ephemeral teams are actually quite costly and risky. They’re costly because teams will spend time and energy going through the forming-storming-norming stages every time a new team comes together. Further, there’s an opportunity cost since that time and energy could’ve been spent developing had the teams been static and long-lived. They’re also risky because, as mentioned, some teams may never exit the storming stage.
Could some of the cost be saved if an ephemeral team wasn’t built from scratch? We might imagine that keeping the core of a group the same while rotating a few members would keep the team identity mostly intact. That could help the team at least continue “norming”, right? Unfortunately, that’s not the case. Tuckman’s research suggests that teams re-enter the storming stage when team dynamics change. Any change in leadership is especially likely to plunge a team back into the storming phase, and since ephemeral teams crop up around new projects, leadership often does change. Even if an engineering manager stays constant with the team, the product manager and other stakeholders likely will not. Since these folks often inhabit a leadership role for a team, every new project will lead to a new round of storming.
Overhead overload
The second danger for ephemeral teams is that cognitive load can get out of hand. Cognitive load is rarely considered when building teams and assigning work, but it is a real constraint.
Guidelines, emerging from work around domains and team relationships to domains, like the books Domain Driven Design, Accelerate, and Team Topologies, indicate that ideally one team would correspond to one domain to keep cognitive load in check. (Please note that these are affiliate links, so if you purchase those books through the link I’ll get a small commission). You can think of a domain as a single conceptual space where terms take on a particular meaning. For instance, if I’m in the “referral” domain, the words “fulfill”, “redeem”, “sender”, and “recipient” take on a particular meaning. One clue that I’ve left the referral domain and entered another is if terms start to mean something else. For example, if “sender” and “recipient” start referring to people passing messages (rather than referrals), that’s a clue I’ve entered say, a “messaging” domain.
Naturally, this means every domain comes with overhead since each one has its own business context and terminology. Therefore, more domains means more overhead for a team. In fact, that overhead becomes more than the sum of its parts, because it now requires context-switching as teams do work in one domain then the other. It also requires more precise communication within the team since the same terms can mean different things, and it requires team members to be on guard against potential miscommunications or misunderstandings. This is all relevant to ephemeral teams since team members are switching from domain to domain every time a new ephemeral team spins up. Engineers must learn the new context they’re in without confusing it with the context they previously learned. This takes time, and creates miscommunication risks. The problem is exacerbated if engineers are still in on-call or maintenance rotations for services they’ve worked on previously. They now must juggle multiple domains as they switch back and forth between on-call and the project at hand which slows them down and adds risk.
Hidden costs of handoffs
Speaking of engineers being on-call, this segues nicely into the last risk: with an ephemeral team, the team developing the code will not be the team operating the code. This is implied by the fact that the ephemeral team is ephemeral; it is intended to dissolve after the project is complete. However, someone has to run the services once they’re built! Someone has to update packages and keep an eye on monitors and respond to production incidents. That means there must be a handoff between the ephemeral team and whoever the “someone” is who will keep it running.
Doing a handoff instantly incurs tech debt since the team taking over the code needs to learn what it does, as well as all the business context to understand why it does what it does. This is necessary to effectively do on-call, or to do future work on the codebase. Logic bugs can’t be addressed unless the team running the service knows what the correct logic should be!
Further, a handoff takes away a chance for learning and growth. That’s because the people who write the code don’t get the opportunity to see how their code does in production. They can’t learn how users respond, which is a lost opportunity to build business knowledge. They also can’t see all the bugs and other production issues that crop up, which is a lost opportunity to refine and iterate on software designs. If teams aren’t building up a knowledge base about their users, and engineers aren’t able to improve how they build, then the company loses out on crucial signals they need to steer the business and they fail to build technical leverage.
Get the benefits, avoid the dangers
With the pros and cons of ephemeral teams laid out before us, we can try to take advantage of the benefits while avoiding the dangers.
First off, if engineer motivation and morale is an issue, the research in Accelerate says that there are many changes we can make while still using static teams. For one, they found that adopting continuous delivery practices can decrease engineer burnout by making deployments less painful, and reducing the amount of rework or unplanned work that teams have to do. Those same practices also have an impact on company culture, often shifting it to be more generative. The book describes this as a culture that shares risks, cooperates well, doesn’t assign blame for failures, and accepts new ideas. Regardless of team, if you take away the worst parts of an engineer’s job (i.e. tedious, risky deployments and fixing production defects) and you improve the culture around them, odds are their outlook will improve significantly.
On top of that, these same delivery capabilities can solve the second issue ephemeral teams attempted to address. Since ephemeral teams are often spun up to increase delivery speed, we can save all of the cost and headache associated with ephemeral teams by simply adopting proven delivery practices instead. Accelerate advocates for test automation, continuous integration, trunk-based dev, loosely-coupled architecture, and autonomous, empowered teams, just to name a few. The authors have proven that these practices not only improve delivery speed, but they also improve system stability—a benefit that ephemeral teams cannot provide. In fact, ephemeral teams run the risk of decreasing system stability due to the handoffs and the cognitive load associated with on-call.
In some rare cases, an ephemeral team may be necessary and actually beneficial. For instance, an ephemeral team could be the best solution if there’s a production issue affecting a system that touches multiple teams (such as a legacy monolithic system). In this case, you’d probably want to pull in developers from multiple teams to get enough context to deal with the issue. Since only one or two team members would be involved from any given team, the disruption to normal feature work and on-call would be minimal. Plus, the handoff would be small since each team was already touching the system.
The key to making the ephemeral team work is to exercise restraint; use the one ephemeral team to solve the problem at hand, but in general rely on autonomous, empowered, long-lived teams to get the job done. Support those teams in investing in continuous delivery capabilities to help them move faster and enjoy their work more. Lastly, learn to break down the work such that it can be brought to the teams, rather than needing to build teams around the work.