Backup an entire Kubernetes cluster using Velero to AWS S3
Time is uncertain so having a backup is very important. The period of backup differs from case to case as there is no set period for preparing the right backup strategy. In this article, we will learn in-depth about backing up a Kubernetes cluster to an AWS S3 bucket using Velero.
With Velero plugins you are not just limited to backing up your Kubernetes cluster to S3 but you can also use other cloud providers like GCP, Azure, Alibaba, DigitalOcean and many more.
I’ll be using an EKS cluster that I have already created with managed node group but you can use an unmanaged K8s cluster or a managed cluster provided by other cloud providers to follow along. So, let’s get going.
S3
We will start by creating an S3 bucket to store the cluster backup.
aws s3 mb s3://skildops-velero-backup-demo
Let’s follow some security best practices and make our bucket secure. These steps are optional so if you don’t want to you can skip to the IAM Role section.
Note: While executing the below mentioned commands make sure to replace skildops-velero-backup-demo with your own bucket name.
Enable public access block:
aws s3api put-public-access-block --bucket skildops-velero-backup-demo --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
Enable default encryption:
aws s3api put-bucket-encryption --bucket skildops-velero-backup-demo --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'
Add a bucket policy to allow connections only over HTTPS:
aws s3api put-bucket-policy --bucket skildops-velero-backup-demo --policy file://policy.json
policy.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSSLRequestsOnly",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::skildops-velero-backup-demo",
"arn:aws:s3:::skildops-velero-backup-demo/*"
],
"Condition": {
"Bool": {
"aws:SecureTransport": "false"
}
}
}
]
}
IAM Role
Note: If you plan to use an IAM user so that Velero pod can interact with S3, you can skip the creation of the IAM role but you can use the policy provided below for your IAM user.
We need an IAM role so that the Velero pods can backup or restore from the S3 bucket. This role can be attached at the node level too but we will follow best practice and attach it to the pod service account.
Create Role:
aws iam create-role --role-name skildops-velero-demo --assume-role-policy-document file://trust-relationship-policy.json
Trust Relationship Policy (If attaching role to service account):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/EKS_OIDC_PROVIDER_URL"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"EKS_OIDC_PROVIDER_URL:sub": "system:serviceaccount:EKS_NAMESPACE:SERVICE_ACCOUNT_NAME"
}
}
}
]
}
Note: You need to replace ACCOUNT_ID with the actual numeric account id, EKS_OIDC_PROVIDER_URL with the OIDC URL created for your EKS cluster without https://, EKS_NAMESPACE with the namespace in which you will be installing Velero and SERVICE_ACCOUNT_NAME with the service account name that will be attached to the pod.
Trust Relationship (If attaching role to node):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::skildops-velero-backup-demo",
"arn:aws:s3:::skildops-velero-backup-demo/*"
],
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:ListBucketMultipartUploads",
"s3:PutObject",
"s3:ListBucket"
]
},
{
"Effect": "Allow",
"Resource": "*",
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot"
]
}
]
}
Velero
We will be using the Helm chart to install Velero.
You can use the below values.yaml file or create your own by referring to the default values.yaml file provided by Velero for installation.
values.yaml:
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.4.1
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
configuration:
provider: aws
backupStorageLocation:
name: "aws"
provider: "velero.io/aws"
bucket: BUCKET_NAME # Replace with bucket name you created above
default: true
config:
region: AWS_REGION # Region where your bucket is located
volumeSnapshotLocation:
name: aws
provider: velero.io/aws
config:
region: AWS_REGION # Region where your volume(s) are located
serviceAccount:
server:
create: true
name: velero
annotations:
eks.amazonaws.com/role-arn: IAM_ROLE_ARN # ARN of IAM role created above
schedules:
eks-cluster:
disabled: false
schedule: "0 0 * * *" # CRON expression to periodically take backups
template:
ttl: "240h" # This setting will delete backups automatically after 10 days
Add helm repository:
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
Install velero using helm chart:
helm install velero vmware-tanzu/velero --namespace <YOUR NAMESPACE> -f values.yaml --create-namespace
Perfect. We now have all the components in place. Let’s check if Velero is able to take backup and restore from that backup. To test this will perform manual backup and restore and to do that we need to install the CLI version of Velero.
Mac:
brew install velero
Linux:
wget https://github.com/vmware-tanzu/velero/releases/download/v1.8.1/velero-v1.8.1-linux-amd64.tar.gz
tar -xvf velero-v1.8.1-linux-amd64.tar.gz
mv velero /usr/local/bin/velero
Windows:
choco install velero
To install for other platforms please refer to the official documentation.
Backup & Restore
Before performing manual backup and restore let’s create a deployment on the cluster:
kubectl create deployment nginx --image=nginx
Now let’s perform a manual backup:
velero backup create demo
Let’s confirm if the backup is complete:
velero backup describe demo
Let’s also confirm if the backup is stored in S3.
Now let’s delete the nginx deployment we created earlier and then perform a restore to confirm if we will be safe during uncertain times.
Delete nginx deployment:
kubectl delete deploy nginx
Restoring the backup:
velero restore create --from-backup demo
Alright. Let’s see if the nginx deployment was restored:
kubectl get deploy
Awesome 🎉 . Our Kubernetes cluster is now safe. If the cluster goes unresponsive or if some services are accidentally deleted you can now easily recover your cluster with just single command.
Note: When restoring to a different cluster make sure to point backup location to the correct bucket and region.