When dealing with an environment that feels chaotic and unreliable, a common tendency is to look for ways to reduce variability and bring things back under control through procedures, hierarchy, metrics, and standardization. However, these attempts are often unsuccessful due to the inherent complexity of these systems: they can't fit in anyone's head, and are too unruly despite all efforts.
I suggest that we relax these ideas of control, and increase focus on flexibility and adaptability. These, and other ideas coming from Resilience Engineering can help us create a toolkit to embrace surprise, and foster a richer view of systems that can extend our abilities to respond both to unforeseen challenges, but also to unexpected opportunities.
In this talk, I'll present various small approaches and patterns that slowly influence how teams deal with reliability, and highlight some of the key interactions and behaviors I keep finding work well in the organizations I've been part of. In the end you can't really cancel out the chaos, but you can embrace the complexity and deal with it a bit better.
What's the focus of your work these days?
I'm a staff member on Honeycomb's SRE team. A lot of my work is reactive, dealing with emergencies, but when there's more time, I focus on training people on our practices and fostering a good operational culture. Additionally, I work on bringing a systemic view of our organization to the organization itself, ensuring we invest in the right things and maintain the right behaviors. Of course, there's also daily operational support.
What's the motivation for your talk at QCon New York 2023?
First, I was invited to a track filled with interesting people, so I wanted to be a part of it. Second, the track focuses on resilience engineering, a topic I've been interested in for quite a few years. In my talk, I aim to provide practical insights and a different perspective beyond just learning from incidents. While learning from incidents is crucial, I believe there are many small influential things we can do in decision-making, addressing challenges, and managing goal conflicts. I want to share these practical experiences I've gained over the years.
How would you describe your main persona and target audience for this session?
While the talk can be relevant to a general audience, it would particularly benefit senior-level individuals who are involved in influence work. This includes interacting with other teams and departments and driving organizational change. If people find it challenging to navigate such situations or have struggled with it in the past, the content I present can be helpful and provide a useful perspective on making these efforts practical.
Is there anything specific that you'd like people to walk away with after watching your session?
I hope to convey the understanding that structuring an organization and implementing procedures doesn't guarantee adherence. Based on my experience, focusing on the actual emerging organization and the work people are doing, even if it's not openly reported or conforming to the intended structure, yields better results. Instead of imposing strict order, it's about daily small acts and adjusting to make people's lives easier. This approach tends to be more effective.
Staff SRE @Honeycombio
Fred Hebert is a staff SRE at Honeycomb.io, caring for SLOs and error budgets, on-call health, alert hygiene, incident response, and operational readiness. He has previously worked as a software developer of all ranks for over a decade and ended up with a healthy dislike of computers and clumsy automation. He’s a published technical author who loves distributed systems, systems engineering and has a strong interest in resilience engineering and human factors.