Mailing List Cleanup - SRE Weekly (Part 5)
(This is part of an openended series of posts where I write down random things I feel are sharable from the years of mailing lists I’ve not caught up on…)
This is part 5 which covers posts from the SRE Weekly folder for calendar year 2024. We’re almost there!
Past parts; Part 1, Part 2, Part 3 and Part 4.
- SLO Compliance Period (AKA SLO Window) – I really, really dislike the term ‘error budget’. I understand the why of it, but that doesn’t change my dislike of it. I like the walking through of all the compliance math though. (As someone who has been responsible for it before.)
- How the data center site selection process works at Dropbox – That they factor in ‘is it in a flight path’ was kinda shocking. Clearly there is a reason for this somewhere. And that they think it is a big enough risk of happening multiple times.
- Pinterest’s Transition to HTTP/3: A Boost in Performance and Reliability – Huh. Guess I should figure out how to get my stuff working with the latest toys. Where latest is 2022. And also answers why a default security group I saw yesterday had UDP open.
- Simple Precision Time Protocol at Meta – More time nerdiness
- The case for Fault Injection testing in Production – Spoiler; only production is production.
- Harnessing chaos in Cloudflare offices – No longer ‘just’ lava lamps
- How Figma’s databases team lived to tell the scale – an experience report
- The Promise and Peril of JSON logging – JSON logging should be the default in 2026. Can it fill disk, yes. Is it worth it? Also, yes.
- Presenting to Engineering Leadership – This is good. He says as someone who detours into stories by default.
- The Rule of 5 Errors – Tuning metrics is hard.
- Delivering Millions of Notifications within Seconds During the Super Bowl – Includes some interesting ideas on how to test this sort of thing including ‘silent notifications’ … which, of course, you need to have already thought through and have deployed at scale.
- Why I don’t like discussing action items during incident reviews – I think I like this. And like it more as a company scales.
- Thermal design supporting Gen 12 hardware: cool, efficient and reliable – Look at the size of that heatsink!