Backup an entire Kubernetes cluster using Velero to AWS S3

Photo by Taylor Vick on Unsplash

Time is uncertain so having a backup is very important. The period of backup differs from case to case as there is no set period for preparing the right backup strategy. In this article, we will learn in-depth about backing up a Kubernetes cluster to an AWS S3 bucket using Velero.

With Velero plugins you are not just limited to backing up your Kubernetes cluster to S3 but you can also use other cloud providers like GCP, Azure, Alibaba, DigitalOcean and many more.

I’ll be using an EKS cluster that I have already created with managed node group but you can use an unmanaged K8s cluster or a managed cluster provided by other cloud providers to follow along. So, let’s get going.

EKS Cluster


S3

We will start by creating an S3 bucket to store the cluster backup.

aws s3 mb s3://skildops-velero-backup-demo

Let’s follow some security best practices and make our bucket secure. These steps are optional so if you don’t want to you can skip to the IAM Role section.

Note: While executing the below mentioned commands make sure to replace skildops-velero-backup-demo with your own bucket name.

Enable public access block:

aws s3api put-public-access-block --bucket skildops-velero-backup-demo --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

Enable default encryption:

aws s3api put-bucket-encryption --bucket skildops-velero-backup-demo --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'

Add a bucket policy to allow connections only over HTTPS:

aws s3api put-bucket-policy --bucket skildops-velero-backup-demo --policy file://policy.json

policy.json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowSSLRequestsOnly",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::skildops-velero-backup-demo",
                "arn:aws:s3:::skildops-velero-backup-demo/*"
            ],
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "false"
                }
            }
        }
    ]
}

S3 Bucket


IAM Role

Note: If you plan to use an IAM user so that Velero pod can interact with S3, you can skip the creation of the IAM role but you can use the policy provided below for your IAM user.

We need an IAM role so that the Velero pods can backup or restore from the S3 bucket. This role can be attached at the node level too but we will follow best practice and attach it to the pod service account.

Create Role:

aws iam create-role --role-name skildops-velero-demo --assume-role-policy-document file://trust-relationship-policy.json

Trust Relationship Policy (If attaching role to service account):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/EKS_OIDC_PROVIDER_URL"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "EKS_OIDC_PROVIDER_URL:sub": "system:serviceaccount:EKS_NAMESPACE:SERVICE_ACCOUNT_NAME"
        }
      }
    }
  ]
}
Note: You need to replace ACCOUNT_ID with the actual numeric account id, EKS_OIDC_PROVIDER_URL with the OIDC URL created for your EKS cluster without https://, EKS_NAMESPACE with the namespace in which you will be installing Velero and SERVICE_ACCOUNT_NAME with the service account name that will be attached to the pod.

Trust Relationship (If attaching role to node):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::skildops-velero-backup-demo",
        "arn:aws:s3:::skildops-velero-backup-demo/*"
      ],
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListBucketMultipartUploads",
        "s3:PutObject",
        "s3:ListBucket"
      ]
    },
    {
      "Effect": "Allow",
      "Resource": "*",
      "Action": [
        "ec2:DescribeVolumes",
        "ec2:DescribeSnapshots",
        "ec2:CreateTags",
        "ec2:CreateVolume",
        "ec2:CreateSnapshot",
        "ec2:DeleteSnapshot"
      ]
    }
  ]
}

IAM Role with Trust Relationship

IAM Role with Policy


Velero

We will be using the Helm chart to install Velero.

You can use the below values.yaml file or create your own by referring to the default values.yaml file provided by Velero for installation.

values.yaml:

initContainers:
  - name: velero-plugin-for-aws
    image: velero/velero-plugin-for-aws:v1.4.1
    imagePullPolicy: IfNotPresent
    volumeMounts:
      - mountPath: /target
        name: plugins
configuration:
  provider: aws
  backupStorageLocation:
    name: "aws"
    provider: "velero.io/aws"
    bucket: BUCKET_NAME    # Replace with bucket name you created above
    default: true
    config:
      region: AWS_REGION    # Region where your bucket is located
  volumeSnapshotLocation:
    name: aws
    provider: velero.io/aws
    config:
      region: AWS_REGION    # Region where your volume(s) are located
serviceAccount: 
  server:
    create: true
    name: velero
    annotations: 
      eks.amazonaws.com/role-arn: IAM_ROLE_ARN    # ARN of IAM role created above
schedules: 
  eks-cluster:
    disabled: false
    schedule: "0 0 * * *"  # CRON expression to periodically take backups
    template:
      ttl: "240h"  # This setting will delete backups automatically after 10 days

Add helm repository:

helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts

Install velero using helm chart:

helm install velero vmware-tanzu/velero --namespace <YOUR NAMESPACE> -f values.yaml --create-namespace

Velero Helm Chart

Velero Pod

Perfect. We now have all the components in place. Let’s check if Velero is able to take backup and restore from that backup. To test this will perform manual backup and restore and to do that we need to install the CLI version of Velero.

Mac:

brew install velero

Linux:

wget https://github.com/vmware-tanzu/velero/releases/download/v1.8.1/velero-v1.8.1-linux-amd64.tar.gz
tar -xvf velero-v1.8.1-linux-amd64.tar.gz
mv velero /usr/local/bin/velero

Windows:

choco install velero

To install for other platforms please refer to the official documentation.


Backup & Restore

Before performing manual backup and restore let’s create a deployment on the cluster:

kubectl create deployment nginx --image=nginx

Nginx Deployment

Now let’s perform a manual backup:

velero backup create demo

Velero Backup

Let’s confirm if the backup is complete:

velero backup describe demo

Velero Backup Status

Let’s also confirm if the backup is stored in S3.

Velero Backup in S3

Now let’s delete the nginx deployment we created earlier and then perform a restore to confirm if we will be safe during uncertain times.

Delete nginx deployment:

kubectl delete deploy nginx

Restoring the backup:

velero restore create --from-backup demo

Velero Restore

Alright. Let’s see if the nginx deployment was restored:

kubectl get deploy

Nginx Restored

Awesome 🎉 . Our Kubernetes cluster is now safe. If the cluster goes unresponsive or if some services are accidentally deleted you can now easily recover your cluster with just single command.

Note: When restoring to a different cluster make sure to point backup location to the correct bucket and region.

Vimal Paliwal

Vim is a DevSecOps Practitioner with over seven years of professional experience. Over the years, he has architected and implemented full fledged solutions for clients using AWS, K8s, Terraform, Python, Shell, Prometheus, etc keeping security as an utmost priority. Along with this, during his journey as an AWS Authorised Instructor he has trained thousands of professionals ranging from startups to fortune companies for over 2 years.

Previous
Previous

Encrypt an existing Kubernetes Persistent Volume running on AWS EKS

Next
Next

How to recover accidentally deleted AMIs or EBS Snapshots using EC2 Recycle Bin