Planning and Architecting for Reliability

Webinar

Planning and Architecting for Reliability - Part 1

Don’t wait for an incident to start focusing on the reliability of your systems. Join this two-part series to take a proactive approach to reliability, so you can prevent incidents from happening in the first place.

In this, the first part, we map dependencies and uncover failure points to identify where to improve reliability.

On-demand

Watch on-demand

About this webinar

The reliability of your systems is crucial, but can often be put on the back burner until an incident occurs. We walk through how to take a proactive approach to reliability so you can find and fix weaknesses before they become incidents.

You’ll walk away having identified vulnerabilities, knowing how to test them for failure, and how to prioritize your reliability efforts across services.

Part 1: Planning for Reliability

Lay the foundation for reliability by better understanding our complex, multi-layered architectures
Map dependencies in a single view and identify failure points

Part 2: Architecting for Reliability

Put reliability plans into action by testing our dependencies and vulnerabilities.
Learn how to test the technologies in your stack against common failure modes.

About the speakers

Vince Huang

Reliability Architect

Gremlin

Vincent is a Reliability Architect at Gremlin, helping teams and companies strategize, design, implement, and interpret their Chaos Engineering and resiliency efforts. Previously, he worked for LinkedIn and Twitch, doing Operations, Site Reliability, and Incident and Problem Management focusing on uptime and availability.

Jacob Plicque

Senior Solutions Architect

Gremlin

Jacob is a Senior Solutions Architect at Gremlin where he works on Chaos Engineering. Jacob has worked on Chaos Engineering across a variety of verticals including finance, e-commerce, airlines, retail, and insurance. Jacob previously worked at Fanatics as a Senior Site Reliability Engineer where he was responsible for providing a reliable e-commerce experience to process over 1,100 orders a minute on peak days such as Cyber Monday and Black Friday.

On-Demand

Explore our tutorials to learn about the technologies and processes that help you manage reliability to a higher standard

Chaos Engineering: the history, principles, and practice

How To Establish a High Severity Incident Management Program

4 Chaos Experiments to Start With

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.

Get started

Webinar