You are viewing content from a past/completed conference.
How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency
Shifting workloads from synchronous to asynchronous can simplify the operational cost of high-throughput HTTP services. But understanding the evolution of performance metrics in the world of complex, high-concurrency, asynchronous distributed systems can be quite challenging.
In this talk, I'll tell you how OneSignal improved the performance and maintainability of its highest-throughput HTTP endpoints (backed by a Kafka consumer in Rust) by making it an asynchronous system. We will cover:
- How metrics changed when the system went from sync to async
- Unique sharding strategies to maximize concurrency and performance, while maintaining consistency for Kafka consumers
- System-level constraints from Postgres infrastructure determining the Kafka scaling strategy.
Speaker
Lily Mara
Engineering Manager @OneSignal
Lily Mara is an Engineering Manager at OneSignal in San Mateo, CA. She manages the Infrastructure Services team, which is responsible for in-house services used by other OneSignal engineering teams. Previously she was a software engineer at OneSignal, leading the efforts to create OneSignal's integration with Mixpanel, develop the outcomes system, and improving performance and code simplicity through refactoring efforts. Lily also worked as a software developer at Kroger, working on Kroger’s online grocery ordering system as well as internal development tools to aid other teams in deployments, monitoring, and local development environments.
Lily is the author of Refactoring to Rust, an early-access book by Manning Publications about improving the performance of existing software systems through the gradual addition of Rust code.
Read more
Find Lily Mara at:
From the same track
Session
Architecture
Reliable Architectures Through Observability
Wednesday Jun 14 / 02:55PM EDT
We want our systems to be reliable, but testing alone isn't enough. In a complex, multi-service system, it's impossible to test your way to correctness. That's why we need observability. Observability is the ability to see what our code is doing, in production and in development.
Kent Quirk
Staff Engineer @Honeycomb.io
Reliable Architectures Through Observability
Session
Architecture
Building an Architecture to Predict Customer Behavior in a Revenue-Critical System
Wednesday Jun 14 / 01:40PM EDT
At Neon digital bank in Brazil, we strive to make revenue-impacting predictions based on customer behavior. Building a low latency and high availability distributed system that meets this requirement becomes especially challenging.
Yves Junqueira
Distinguished Software Engineer @Neon
Building an Architecture to Predict Customer Behavior in a Revenue-Critical System
Session
Developer Environment
Architecting a Production Development Environment for Reliability
Wednesday Jun 14 / 04:10PM EDT
At Meta, developers use a combination of development servers, including virtual machines and physical hosts, as well as on-demand containers to perform their daily software engineering work.
Henrique Andrade
Production Engineer @Meta
Architecting a Production Development Environment for Reliability
Session
Cloud Architecture
Survival Strategies for the Noisy Neighbor Apocalypse
Wednesday Jun 14 / 05:25PM EDT
Noisy neighbor issues are a common challenge for multi-tenant platforms, leading to resource contention, performance degradation, and costly downtime for other tenants sharing the same resources.
Meenakshi Jindal
Staff Software Engineer @Netflix
Survival Strategies for the Noisy Neighbor Apocalypse
Session
Unconference: Designing Modern Reliable Architectures
Wednesday Jun 14 / 11:50AM EDT
What is an unconference?
An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.
Unconference: Designing Modern Reliable Architectures