Resilience Engineering - Culture as a System Requirement

Learn how organizations remain resilient across changing socio-technical systems. Come hear about how SREs and Ops engineers make change happen and how they respond to outages and learn from incidents.

From this track


Two Years of Incidents at 6 Different Companies: How a Culture of Resilience Can Help You Accomplish Your Goals

Thursday Jun 15 / 10:35AM EDT

Incidents and outages are expensive, they impact engineering productivity, business goals, and your company’s reputation. In this talk I will describe how we can apply resilience throughout the incident lifecycle in order to turn incidents into opportunities.

Speaker image - Vanessa Huerta Granda

Vanessa Huerta Granda

Solutions Engineer

Session Resilience

Comparing Apples and Volkswagens: The Problem With Aggregate Incident Metrics

Thursday Jun 15 / 11:50AM EDT

This talk presents data from the Verica Open Incident Database (VOID) to conclusively demonstrate how aggregate incident metrics (MTTR, severity, # of incidents/time) aren't representative of your systems' resilience.

Speaker image - Courtney Nash

Courtney Nash

Internet Incident Librarian & Senior Research Analyst at Verica, previously @Holloway @Fastly @O’Reilly Media @Microsoft & @Amazon

Session Resilience Engineering

Resilience Hides in Plain Sight

Thursday Jun 15 / 01:40PM EDT

Think of the most out-of-nowhere and surprising incident you've experienced.

Speaker image - John Allspaw

John Allspaw

Founder and Principal @Adaptive Capacity Labs

Session Resilience Engineering

Embrace Complexity; Tighten Your Feedback Loops

Thursday Jun 15 / 02:55PM EDT

When dealing with an environment that feels chaotic and unreliable, a common tendency is to look for ways to reduce variability and bring things back under control through procedures, hierarchy, metrics, and standardization.

Speaker image - Fred  Hebert

Fred Hebert

Staff SRE @Honeycombio

Session Resilience Engineering

5 Strategies to Resiliently Handle Uncertainty, Time Pressure & Change

Thursday Jun 15 / 04:10PM EDT

As an engineer tasked with keeping large-scale software systems running under changing priorities and time pressure, you need REsilience capabilities that are both technical and organizational to successfully navigate modern software engineering work.

Speaker image - Dr. Laura Maguire

Dr. Laura Maguire

Cognitive Systems Engineer & Researcher


Thursday Jun 15 / 10:30AM EDT


Track Host

Vanessa Huerta Granda

Solutions Engineer

Vanessa is a Solutions Engineer at helping companies make the most of their incidents. Previously, she led Resilience Engineering at Enova and has spent the last decade focusing on Production Incident processes, learning from incidents, and handling Major Incidents as Incident Commander. She has spoken and written on incident metrics, sharing learnings, and in 2021 co-authored Jeli’s Howie: The Post-Incident Guide

She is passionate about continuous improvement, getting teams to talk to each other, and sharing incident findings.

Read more