The Art and Science of Site Reliability Engineering with Liz Fong-Jones
Oct 9 2024
Length: 33 mins
Podcast

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

The Art and Science of Site Reliability Engineering with Liz Fong-Jones

Listen for free

View show details

Summary
In this exciting episode of Cloud Dialogues, we are joined by Liz Fong-Jones, Field CTO at Honeycomb and former Google SRE, to explore the fascinating world of Site Reliability Engineering (SRE)—a game-changer for scaling and automating large systems.

What We Covered:

1. Meet Liz Fong-Jones: Liz brings over a decade of SRE experience from her time at Google and Honeycomb, helping companies revolutionize how they manage reliability and automation.

2. The Origin Story: SRE actually predates the cloud! Born at Google in the early 2000s, SRE started as a way to automate manual system administration tasks and has since evolved into its own discipline, running parallel to DevOps.

3. SRE at Its Core: - Minimize repetitive work (aka "toil") by automating everything you can. - Use Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain reliability.

4. Different SRE Models: There are different ways to implement SRE: - Tools-based within platform teams - Consultative SREs parachuting in to help teams - Embedded SREs integrated within every team

5. The SRE Mindset: Curiosity and empathy are essential for SREs. Teams need a culture of psychological safety where concerns can be raised without fear.

6. The Magic of SLOs and SLIs: SLOs set reliability targets (like aiming for 99.5% uptime), while SLIs measure performance against those targets. Together, they ensure your systems are running smoothly.

7. FinOps Meets SRE: Liz explains how SREs can help balance reliability, performance, and costs using SLOs to allocate resources more efficiently.

8. Disaster Testing: Want proof SREs are ready for anything? Honeycomb regularly tests its disaster recovery by taking down an entire availability zone—on purpose!

9. Pro Tips for Executives: Thinking about implementing SRE at your company? Liz suggests starting with your biggest challenges, offering executive support, and setting clear, achievable SLOs.

10. Why Observability Matters: Observability is the backbone of SRE. Having real-time, actionable data is key for setting and managing effective SLOs.

Plus, Liz gives covers off on her favorite ARM processors (for cost and environmental savings) and shares insights from her book Observability Engineering.

This episode is a deep dive into SRE, filled with actionable insights and strategies for leaders looking to supercharge their reliability game. You won’t want to miss it!

Show More Show Less

Show More Show Less

What listeners say about The Art and Science of Site Reliability Engineering with Liz Fong-Jones

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

Audible.com.au reviews

Amazon Reviews

No Reviews are Available

Report a review on Amazon

Audiobook Categories

More to Explore

GETTING STARTED

The Art and Science of Site Reliability Engineering with Liz Fong-Jones

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

The Art and Science of Site Reliability Engineering with Liz Fong-Jones

Summary

What listeners say about The Art and Science of Site Reliability Engineering with Liz Fong-Jones

Reviews - Please select the tabs below to change the source of reviews.

Audible.com.au reviews

Amazon Reviews