Mailing List Cleanup - SRE Weekly (Part 6)
(This is part of an openended series of posts where I write down random things I feel are sharable from the years of mailing lists I’ve not caught up on…)
Waiting for my Claude tokens to reset, so this is part 6 which covers posts from the SRE Weekly folder from the start of 2025 to .. now. Woah. Only 2 more mailing list boxes have things. (Kinda 3, but it will be a different beast entirely and so doesn’t get counted.)
Past parts; Part 1, Part 2, Part 3, Part 4 and Part 5
- What’s the most bizarre root cause you’ve ever seen? – And then we all cross our fingers none of these happen to us.
- So You Want to Build Your Own Data Center – I don’t, but am glad people who do end up documenting it
- The Real Failure Rate of EBS – When you ‘create and destroy tens of thousands’ of a thing a day, you get interesting insights (and product ideas/marketing content)
- Hot Take: I Want Execs Closer to Incidents, Not Farther – Absolutely.
- Seventh-generation server hardware at Dropbox: our most efficient and capable architecture yet – I feel like I’ve linked to this already, but… anyways, the ‘What we learned along the way’ is great. Thermal effeciency is the bottleneck.
- Inside Husky’s query engine: Real-time access to 100 trillion events – It checks out that Datadog has its own custom event store. Here’s how the query optimizer works.
- Advancing Our Chef Infrastructure: Safety Without Disruption – I wonder if this would make sense to do at smaller scale than Slack
- Datadog, Thank You for Blocking Us – Note that that they didnt build anything bespoke, but used AI to rewire to something else. This is where I see AI better deployed than ‘build me a better Datadog’
- The dangers of SSL certificates – Another article I didn’t link to talked about employee attrition as a root cause of incidents. This is one such case. Or was at least a factor in the resolution time.
- The Database Is About to Lose Its Last Line of Defense – Welcome to the new governance world.