Skip to content

Cassandra dockers with CRON backup to S3 and restoration.

Notifications You must be signed in to change notification settings

siganberg/CassandraBackup

Repository files navigation

This is work-in-progress

Cassandra Kubernetes Backup and Restoration Steps

This guide is using Cassandra-aws-backup.sh. It's already included with customized Cassandra docker image. The script is based on Google Cloud Storage for Cassandra Disaster recovery script and modified to make it work to AWS.

You can also build the custom Cassandra docker image by executing the build command.

docker build . --tag {tagname}

Note: ENTRYPOINT (docker-entrypoint.sh) is not used. To be able to run both CRON and Cassandra process we need to customize the image startup. The "start_with_cron.sh" is used instead to run CRON and then Cassandra on the foreground.

Cassandra custom docker image contains the following addition to support the backup and restore.

  • AWS CLI (apt-get install awscli)
  • rsync (apt-get install rysnc)
  • cron (apt-get install cron)
  • Incremental backup set to true on cassandra.yaml

Pre-requisite

Environment variables needed for AWS.

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • CASSANDRA_USER
  • CASSANDRA_PASS

Performing manual backup

Bash remotely to one of the Cassandra pod clusters. For example, using deployment statefulset with pod name Cassandra.

kubectl exec -it cassandra-0 bash

Run the following command to start the backup. This will create a snapshot and upload to S3 Bucket.

./cassandra-aws-backup.sh -b s3://{s3_bucket} -vcC -u <cassandra_username> -p <cassandra_password>

Performing restore

For Cassandra containerized, the restore is a little bit tricky. We need to create a cassandra pod on Kubernetes that doesn't start automatically so we can restore the database. The way the containerized works is there is always one main process (PID = 1) that needs to keep running in the foreground. In order to do that we need to add a process that doesn't complete, basically, will take PID = 1 instead of cassandra.

One way is to modify the statefulset yaml and add the following entry.

    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]

Example YAML.

  - name: cassandra
    image: siganberg/cassandra_kube
    imagePullPolicy: IfNotPresent
    ports:
    - containerPort: 7000
      name: intra-node 
    - containerPort: 7001
      name: tls-intra-node
    - containerPort: 7199
      name: jmx
    - containerPort: 9042
      name: cql        
    # Just spin & wait forever
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]

Kubernetes will create the Cassandra pod but won't start which exactly what we want for the restoration process.

From here we can do remote bash again from one of the cassandra pod.

kubectl exec -it cassandra-0 bash

Execute the following commands to manually create the folders.

mkdir /var/lib/cassandra/commitlog
mkdir /var/lib/cassandra/data
mkdir /var/lib/cassandra/saved_caches

We need to get the backup path of the snapshot that we want to restore. You can use the command below to get the list from AWS Bucket. Alternatively, you can go the AWS console and browse.

./cassandra-aws-backup.sh -b s3://{s3_bucket} inventory

The command above should give you all available snapshots based on pod hostname.

For example, To start the restore execute this command.

./cassandra-aws-backup.sh -v -u <cassandra_username> -p <cassandra_password> -b s3://{fullpath_of_compressed_tar}.tar restore

After the successful restore, modify the statefulset YAML. Remove the following entry and cassandra should restart normally again bounded to the restored data.

    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 30; done;" ]

Note: Your statefulset cassandra needs to have persistent volume (PV). The persistent volume usually bounded by fixed name using persistent volume claim (PVC). This makes cassandra storage resilient even if we kill the POD like the way we are doing on these steps by adding and removing the infinite sleep.

TODO

  • Add instructions for setting up CRON backup.

Author

Francis Marasigan

About

Cassandra dockers with CRON backup to S3 and restoration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages