One of the easiest ways of building resilience into a system running in AWS is to use an autoscaling group. Generally speaking, I use one for any service which is required to self-heal - even when aiming to maintain a steady number of instances, as is desirable when running servers for Consul and Nomad, as well as a whole host of other clustered systems. Unhealthy instances can simply be replaced, usually without operator intervention, and launch configurations can be used to simplify upgrading clustered software one instance at a time.
[Read More]