How to lead your people out of a storm
When the s*** hits the fan
At some point in your management career, you will get sucked into a storm of epic proportions. A storm unlike no other you have seen. These storms could take multiple shapes: PR nightmare, major production incident, a big customer threatening to churn, etc, but there is one commonality across the various shapes this storm could take. It will be hell on you and your team. There will be loud voices. There will be animated gesticulations of limbs. Tempers will flare. Anxiety will go through the roof. There will be threats of heads rolling. There will be mentions of needing to let the board of directors know. You get the idea. This post is about how to lead your team through the storm successfully.
The most important thing is fairly easy to achieve theoretically but hard to achieve consistently in practice, and that thing is showing up. Showing up no matter how badly your heart wants to avoid it is very hard, but, in the end, you, as the leader are responsible for shepherding your team through the storm, so you have to show up, every, single, damn, time.
It sounds simple, but I have seen countless examples of leaders shying away from jumping into the fire. I am sure if you ask them, their response will be something along the lines of, "The team seems to have a handle on this. I am not sure I will add more value". I have been guilty of doing this in the past, but over time, I have realized how much of a trust buster that move is. If your team is dealing with a category 4 storm, you HAVE to show up to the party even if you don't have the solutions to the problems the team is trying to navigate through.
In a big fire, you have to focus on four major areas that we will now unpack further.
Bystander Intervention
Picture this. The entire application is down. Your company is losing millions of dollars every hour the application is down. Customers are sending you angry messages via social media, customer support channels, and even to your personal emails. Your CEO is losing their mind. You yell, 'All hands on deck!' in the group slack channel. Your engineering team has extreme ownership, so a lot of people (maybe everyone!) join the call, including you. Everybody is fired up to help out! And then………crickets......nothing happens. Welcome to the bystander effect! Everybody is waiting for someone else to do something.
Most engineers don't like being told what to do, and naturally, it follows that they are very bad at giving marching orders to others. They prefer giving gentle nudges and suggestions, just like in a pull request. This is a situation where you have to do the opposite. You have to start giving orders and start assigning owners.
First, ask your team who is best suited to solve the issue. If no hands go up (which, in my experience, is rare; engineers love solving puzzles), pick a person who you think has a decent shot at diagnosing and debugging the issue. Then, pick at least one more person to act as a rubber duck to the primary. Then, send them out of the main room/zoom call. Yes, send them away. You have to create an environment for them where they can debug and discuss in peace without having to deal with all the noise that invariably comes with a big issue. I typically ask the lead engineer to send me updates every thirty minutes, and if they forget to, I jump into the other Zoom room/war room to get an update.
Don't reduce the nag window any further than thirty minutes, or your team will hate you, and you will be cutting into their focus time.
Next, you have to kick everyone else out of the main Zoom call except your key stakeholders. The main benefit is to reduce the number of 'helpful' voices in the room. It sounds counterintuitive, but twenty heads are not better than two when dealing with a time-sensitive problem.
The stakeholders vary from incident to incident and company to company, but at the minimum, they should include your product counterpart and a representative from the customer support (voice of the customer) team. If you are dealing with a PR issue (data loss, bad press on social media, etc), you will need your security leader, legal counsel, and social media manager on the call as well. Usually, I don't pull in my boss automatically unless I need their help with something, but I also don't kick them out and leave it up to them to manage their engagement in the incident.
So now the stage is set. You have identified your fixers and stakeholders, reduced the distractions around them, and now you can start fixing the problem. Right? Wrong. Now, you have to start communicating.
Communications Officer
As a leader managing an incident, you have three main responsibilities. As I mentioned in the previous section, the first is just showing up. The next is ensuring clear lines of communication across all interested parties including the ones affected by the incident. We will get to the third one in the next section. And no, fixing the issue is not your primary responsibility.
After you send off the 'fixers' into a separate Zoom (or physical) room, make a list of all the people you need to start sending updates to. Keep in mind that this list is longer than the list of key stakeholders (previous section) who are helping you with the incident. These usual suspects are-
Your boss and/or company executives - Work with your manager to figure out where to send your updates to. In the hybrid/remote world, this is usually a private slack channel. Send out updates every thirty minutes with the status of the issue and the ETA for a fix.
Internal Teams (customer support, product, sales, etc.) - I usually use a public Slack channel that's open to everyone in the company to send out updates. Use the same thirty-minute cadence as before.
External facing communications - Sometimes, you will have to communicate statuses to your customers. This varies from company to company and takes various shapes like an email broadcast, updates to a public-facing status page, updates to social media, custom emails to high-value customers, etc. I usually have someone from marketing proofread my updates to ensure they are at the right altitude for the customers. If you are dealing with a security/data loss/privacy-related issue, you also need to get your legal counsel to give feedback on your message before you send it out. Lastly, sometimes, it might put your customers at ease if the communications come from an executive, so always check with your leadership chain before hitting send. If you feel that the incident is hot enough, don't hesitate to make that suggestion to your leadership. Sometimes, depending on the severity of the issue, the message might have to come out of the CEO's inbox.
Temperature Regulator
When you are dealing with high-temperature issues, you will also have to deal with rising temperatures within individuals and sometimes entire teams. In my experience, in moments of extreme stress, the room (virtual or otherwise) gets ultra-attuned to the tone of the authority figure in the room. So, if you want the room to be calm, you must be calm. If you want the room to be riled up, you have to be riled up. I recommend calmness though. This is your third most important job while leading your team through a storm.
Being calm doesn't mean being lackadaisical. You can show urgency and stay calm at the same time.
In fact, being able to show urgency and calmness at the same time is what your team desperately needs in times of stress.
For example, It is okay to ask for regular updates, but it is not okay to publicly show frustration when the team hasn't found a solution yet. It is okay to pull in additional people into the incident to help out, but it is not okay to publicly remove (passive-aggressive shaming) the lead fixer from their room and put somebody else in their place. It is okay to ask for an ETA once the team identifies a fix, but it is not okay to get frustrated with the team if they miss their ETA.
Lastly, you might have to eject someone from the call who is increasing the temperature in the room. This is a rare occurrence, but it does happen from time to time.
Risk Underwriter
The last thing you will have to do in the event of a high-temperature issue is to underwrite the actions the team is taking or about to take. Teams often hesitate to 'push the button,' a.k.a. test out a potential fix for two big reasons. 1) They are unsure if the fix will work 2) They think there is a chance they might make the situation worse.
I think there is some truth to bullet number one above, but number two is almost impossible when the house is completely engulfed in flames. When the application is down, making it even more down is impossible. I do recommend discussing with your team about the size of the blast radius and whether the decision is a two-way door decision or not. In my experience, the only decisions you need to think through carefully (before pushing the button) are one-way doors, for example-
Deleting data with no way to get back to the previous state
Moving to an untested infrastructure on the fly with no easy way to get back to the previous infrastructure. Be extra careful with deploying anything related to authentication or authorization or anything connected to security in general.
Any public (social media) or semi-public (email, internal community board) communications
Any legal notice to customer(s) or vendor(s)
If the change your team is pursuing doesn't fit any of the broad categories outlined above, there is a decent chance that the change is a two-way door and, hence, can be unwound quickly if needed. If it is a one-way door decision, get as much feedback as possible about the change, underwrite the associated risk, and get your team to push the button. Tell them you will take care of all the negative consequences. And if you do run into negative consequences…well… let's tackle that in a future post.
P.S - If you enjoyed reading this, consider sharing this by hitting the button below
P.P.S - No post during the thanksgiving week. Will be back the week after. Happy Thanksgiving All!

