Disaster Recovery for AWS CloudHSM

“Everything fails, all the time”

— AWS CTO, Werner Vogels

No matter how highly available your infrastructure is but having disaster recovery plan for each of the critical infrastructure services is equally important and is always rewarding. Having a well-tested DR plan helps organisation’s to recover from an event that has negatively affected the business operations.

Disaster Recovery planning normally revolves around RTO (Recovery Time Objective) and RPO (Recovery Point Objective). RTO refers to the time an organisation can afford to stay offline without affecting the business adversely and RPO refers to the maximum amount of data an organisation can afford to loose. For example, if the backup is set to happen every 4 hours, the organisation can loose maximum of 4 hours of data in case of a disaster and this is called RPO.

Based on the RTO and RPO a recovery strategy is decided. There are 4 recovery strategies to choose from:

DR Strategies

DR Strategies (Source: AWS)


CloudHSM

AWS CloudHSM is a single-tenant dedicated hardware security module that sits within your VPC and can store both symmetric and asymmetric encryption keys. As of today, AWS do not provide native DR support for CloudHSM hence, we need to perform few manual steps to add DR for our CloudHSM cluster.

Note: Having a CloudHSM cluster in your account is not required to follow along but is good to have. If you don't have one and want to create one you can refer to one of our article that helps you create, initialise and activate your CloudHSM cluster. Before creating the cluster make sure to check the pricing page as the service do not offer any free tier usage.

Let’s visit the CloudHSM dashboard to add DR for our cluster.

Fig 1. CloudHSM Dashboard

Once you are at the dashboard page, using the left panel click on the Backups link to see automated backups of your CloudHSM cluster.

Fig 2. CloudHSM Backups

Then, select the latest backup, click on Actions and click on Copy backup to another region. Select the region and hit the Copy backup button to add DR for your CloudHSM cluster.

Fig 3. CloudHSM Backup Copy

By switching the region, verify if the backup was copied over to the destination region.

Fig 4. CloudHSM Backup Destination

Once the backup is available in the destination region you can create or restore a CloudHSM cluster from it whenever your primary region goes down.

Note: By copying only one backup you are not fully prepared for disaster. You need to periodically copy the latest backups from source region to destination.

Caveat

The above mentioned process is manual and requires us to copy latest backups periodically to the destination region which is not very efficient way to add DR support to our CloudHSM cluster because this adds the risk of forgetting to copy backups at regular interval. To avoid this it is highly recommended to automate the process.

To automate the process, you can either use the open-source module created by us or you can build your own automation by creating a CloudWatch event that triggers a lambda function at regular intervals to copy the CloudHSM backups from source region to destination.


Covering the basics

  • CloudHSM is a single-tenant hardware security module that is deployed in your own VPC. It can be used to store root CA, encryption keys including symmetric keys and asymmetric key pairs, SSL certificates, etc.

  • KMS is a multi-tenant hardware security module that is fully managed by AWS whereas CloudHSM is a single-tenant hardware security owned and managed by the customer. KMS has been validated under FIPS 140-2 Level 2 compliance where CloudHSM has been validated under FIPS 140-2 Level 3 compliance.

  • AWS CloudHSM can be deployed either via console, cli or API. You start by creating a cluster and initialising it. After which you attach HSM nodes depending on the requirement. It is recommended to have at least 2 HSM nodes per cluster for high availability. Once the node is deployed, you need to activate the cluster by activating the admin user using the CloudHSM cli tool.

Vimal Paliwal

Vim is a DevSecOps Practitioner with over seven years of professional experience. Over the years, he has architected and implemented full fledged solutions for clients using AWS, K8s, Terraform, Python, Shell, Prometheus, etc keeping security as an utmost priority. Along with this, during his journey as an AWS Authorised Instructor he has trained thousands of professionals ranging from startups to fortune companies for over 2 years.

Previous
Previous

Denying connections originating from default AWS domain using WAF

Next
Next

Authenticate to Kubernetes API server running on AWS using IAM role