This is a guest article by Christophe Rochefolle, Director of Operational Excellence at OUI.sncf, France. It is based on a presentation (in French) he gave during the second Meetup of the Paris Chaos Engineering Community.
You may start by telling them you will break everything in production, and that will be fun! We call this in French “coming with big clogs — Gros Sabots”. They will see you coming from far, but except for noises, there is few chances you will hit the target.
They will probably ask for Return of Investment (R.O.I.). By arranging number of incidents, outage by year, cost of outage, and some hypothesis to reduce them, you may easily build a proposition with 2–3 years R.O.I.
However, between a project with new revenue and one which will maybe avoid losses, if they have to choose, the first one will be probably chosen first.
The rational approach is not the best one for a subject like Chaos Engineering. You must review your influence strategies.
Biologist Henri Laborit wrote in 1976: “Faced to unknown experience, man has only three choices: fight, do nothing or flee “. This subject is quite new, with few visibility, it is important to allow your boss to discover the concept at his speed, to avoid instinctive rejection.
You may start by sharing on internal or external social network some interesting papers on the subject:
Depending on tools used by your boss, you just need to find a way to send them those articles. I was lucky to be followed on twitter by many members of our executive board, I never hesitate to use it to make them aware on some subjects I will later discuss with them, including some post from executive influencers, like the CEO of a major bank Société Général:
Our first goal here is only to make Chaos Engineering sound familiar before going further.
To be able to adjust your story to the different directors of your executive board, you need to identify the kind of players they are, among other things using sociodynamics:
From book : L’élan sociodynamique, Jean-Christian Fauvet You will be able to focus by adapting your strategy and identifying the way they move and act. Using your energy on the right priorities to make them allies.
What are their major challenges? What they may win or lose by collaborate with you? What are their influences on other players? Take time to identify the key issues at stake with decision-takers, as well as the brakes and leverages in relation to your subject. Concerns and objections are only excuses for your opponents to test you, but may also be real blocking points. You will win them by studying and including their views, but not necessarily during formal meeting…
Based on this map, you will be able to adapt your strategy:
Once you have identified your players, you should be adapt your story to their concerns, questions or objections. Never hesitate to play with emotions, they are a major factor in decision making.
The most obvious emotion to play with is fear: fear of major outage. The one which will impact your revenue. For instance, 5-minute outage represent around one million dollars for Google/Alphabet, one hundred thousand dollars for Netflix and 3 million for Apple:
Outage Outrage: the True Cost of Tech Giant Downtime by Jolt
Moreover, it is during this type of incident that you should not hesitate to be opportunistic to advance your pawns, to propose new practices that will limit the impact in future incidents.
One of the best way is to speak about resilience - Ability to recover quickly from difficulties.
Werner Vogels, Vice President & Chief Technology Officer at Amazon, introduce this subject by explaining that with size and continuous evolution of our system: “Everything fails all the time”. The main point is no more to avoid failures, but to limit impact of those failures.
With you CIO/CTO, you may tell it is better to have an experiment in production during day time with everyone available, thus training teams and limit impact, instead of react during night outage with sleepy people and limited resources. The final goal of Chaos Engineering is to sleep like a log, without worrying about issues.
You may also point out links with actual practice. For business continuity plan, you already do experiments in production, disaster recovery testing. This type of experiment is at the heart of the Chaos Engineering approach, the main difference is that we don’t want to do it once, manually. We want to do it automatically and continuously.
Similarly, a Chaos Engineering experiment is about injecting perturbation and analyzing impact to detect weakness. This kind of analysis is very close to root cause analysis used during postmortem of outage. Regular practice during planned exercises, will allow your teams to improve their skills and to work more on prevention action than correcting issues. Moreover, it will test your monitoring and alerting system, a task rarely done in real life, even if detection is a major part of an effective system to limit outage impact.
You may also include you Chief Security as experiments will also help reinforce defense and security of your system: Security Chaos Engineering: A new paradigm for cybersecurity.
For the most concerned, do not hesitate to show that experiments of this type exist in more and more companies, as you can see in this mind map:
In summary, as in the parable of the blind men and the elephant, every member of your executive board will see the point that will make them confident or even enthusiast:
Parable of the blind men and the elephant
This approach will allow you to build allies who will help you influence the system and launch Chaos Engineering in your company.
If you still think that R.O.I is the only way that will work for you, and that this article is just bullshit, please have a look at Gary Yukl “Leadership in Organization”:
Sources: Yukl, Lepsinger, & Lucia, 1992; Yukl, Chavez, & Seifert, 2005; Yukl, Seifert, & Chavez, 2008
Rational persuasion like R.O.I only provides 23% of engagement whereas inspiration tactics (90%), consultation (55%) and personal appeals (42%) are far more efficient.
In any case, buying a beer (31%) is always better than pressure (3%). In French, we say that we do not suffer pressure, we drink it, as pressure (“pression” in French) is a synonym for beer.
Please do share your tactics to convince your boss to do Chaos Engineering in comments below!
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started