Another ‘free consulting is content’ post. The context here is a 10 year old company a friend of mine is the VP of Engineering at whose delivery pipeline worked … but there were some horrible manual steps (as compared to manually pushing a button steps which are perfectly acceptable, if not desirable) and the things were too custom and black box-y. Oh, and the deploy from CircleCI was just flat out broken right now. The gist of the conversation was ‘if you helped us out, what would it look like.’

What’s interesting is that this, and other conversations like this that I have had in the last month have really distilled my thoughts around pipelines which leads to a playbook of sorts, but that’s beyond scope of this. Aside from this looking a lot like what the playbook looks like.

Anyhow, here is the ‘only slightly edited’ bit of free consulting I gave.

  1. Check that things that should already be done are done

Root account has a hardward MFA token that is somewhere secure, CloudTrail is enabled and has the fun Lambda script to auto re-enable is disabled, deletion protection turned on, etc.

  1. CodeDeploy

Since deploying from CircleCI is busted anyways, get it producing CodeDeploy packages and manually install the agent on all the boxes

  1. Packerize all images

Standardize on a Linux distro (another other than Amazon Linux 2 is silly). Create base AMIs with CodeBuild triggered off of Github webhooks to the $company-Packer repo. Again, doesnt matter which configuration management tool Packer uses — as long as they can justify the choice. And as I mentioned, AWS has given a credible reason to use Ansible with the integration of running playbooks from System Manager.

  1. Replace CircleCI with CodePipeline (orchestration) and CodeBuild (build, test and package) — since deploy is already done via CodeDeploy
  2. Feature Flags

Managed via an admin screen into the database (not file-based) to dark launch features to cohorts and/or percentages before full availability.

  1. Airplane Development

‘Can you do development on an airplane without internet access’ — so no shared databases, needing to reach out to the internet for js or fonts or icons, etc. Look at developer onboarding at this point too. Vagrant is great. Docker is the hotness. But Vagrant means you can literally have the same configuration locally as you do in production. Docker can too of course if you are going Fargate/ECS.

  1. Health

Monitoring (all the layers, reactive and proactive — all but one of my major outages could have been predicted if I was watching the right things), Logging (centralized, retention), Paging (when and why and fix the broken windows), Testing-in-Production (it’s the only environment that counts), Health Radiator (there should be a big screen with health indicators, but system and business in your area), etc.

  1. Autoscaling

Up and Down, at all the layers. Driven by monitoring and logging.

  1. Bring everything under Terraform control

Yes, only at this point. It ‘works’ now — just not the way you want it to. Everything above doesnt ‘work’. Again, I’d use Terraform over CloudFormation, but for ‘all in on aws’ CloudFormation is certainly an option. Now if only CloudFormation was considerd a first class citizen inside AWS and supported new features before competitors like Terraform does. CloudFormation still doesn’t have Route 53 Delegation Sets the last time I checked.

  1. Disaster Recovery

‘Can you take the last backup and your Terraform scripts and light up $company in a net new aws account and be able to down a maximum of how long it takes to copy RDS snapshots or lose only data from last in-flight backup.’

  1. Move to Aurora

Just because I like the idea of having the ability to have the database trigger Lambda functions

  1. Observability

Slightly different than Health — basically I would use Honeycomb because Charity, etc. are far too smart.

  1. Chaos Engineering

Self healing, multi-region, etc. If Facebook can cut the power to their London datacenter and no one notices, $company can do something less dramatic with equal effect.

And then it’s ‘just’ keeping the ship sailing the way you want, making slight corrections in the course along the way.