Mailing List Cleanup - SRE Weekly (Part 3)
(This is part of an open ended series of posts where I write down random things I feel are sharable from the years of mailing lists I’ve not caught up on…)
This is part 3 which covers posts from the SRE Weekly folder for calendar year 2022.
- Day 22 - So, You’re Incident Commander, Now What? – I like the communications section. Communications means humans, and humans are messy. Anything to help deal with that is always welcome.
- Why might you run your own DNS server? – DNS nerdiness. She has all sorts of DNS things you should read on her site.
- Power Loss Siren: Making Meta resilient to power loss events – Power management at Data Centres is fascinating. In a Re:Invent talk a number of years ago they discussed how they build their own transformer gear. Just nuts.
- Calculating composite SLA – Useful for those of us who have had to answer vendor questionnaires about this.
- Will circuit breakers solve my problems? – Food for thought. The ‘Bottom Line’ paragraph is the one that you need to read.
- Operation Jumbo Drop: How sending large packets broke our AWS network –
... or just love a good rabbit hole - That time we unplugged a data center to test our disaster readiness – This is Dropbox. I remember reading about when Facebook did this. So much fun.
- What is a Runbook? Improve Efficiency and Incident Response – Runbooks are another maturity measure for startups. And no, it is not too early to look at them.
- What is Backoff For? – From the circuit breaker guy
- Reduce software outage risk with passive guardrails – I think I like this approach
- What makes a good alert? – With a fun 2x2 table to illustrate his point