Schedule Google Datastore Backup
It is important to take backup of all the data resources. Although datastore is a managed service, we have to take backup on a periodic basis. It is more important as company had already tasted attack on dataloss.
Architecture
Setup Commands
Create Storage Bucket
Create a storage Bucket in the same project as the datastore reside. Preferably same geo location to avoid external charges.
gsutil mb -p <gcp-project> -l US-CENTRAL1 -c NEARLINE gs://datastore-backup-project
Create Pub/sub Topic
Create a pub/sub topic which gets data from a cloud scheduler job. This helps creating multiple execution jobs easily.
gcloud pubsub topics create datastorebackup - -project=<gcp-project>
Get datastore Informations
Datastore initially designed with an app engine, hence project or app-engines default location is the place datastore location will be defined. Below command will provide datastore location.
To Get datastore namespace list, go to project’s datastore page & run GQL query
#Datastore Location
gcloud app describe — project=<gcp-project> | grep location
#Datastore Namespaces GQL (id=1 is default)
SELECT __key__ FROM __namespace__
Create Scheduler Job
This will be the start of the execution process. Currently it is scheduled below for daily ( unix cron ). Below command also states the storage location & namespaces of the datastore to take backup. “” blank means default namespaceId.
gcloud scheduler jobs create pubsub scheduledDatastoreExport \
— schedule=”0 0 * * *” \
— topic=projects/<gcp-project>/topics/datastorebackup \
— message-body=’{ “bucket”: “gs://datastore-backup-<gcp-project>”,”namespaceIds”: [“”, “namespace1”]}’ \
— project=<gcp-project>
Create & Deploy Cloud Function
This one is a sample google cloud function to take datastore backup. We use the same to ease effort. Ensure the region of cloud function & datastore remains the same.
git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
gcloud functions deploy datastore_export \
— source python-docs-samples/datastore/schedule-export/ \
— runtime python37 \
— entry-point datastore_export \
— trigger-topic datastorebackup \
— project <gcp-project> \
— region us-central1
Test Function working
We are using pub/sub & scheduler to invoke a function. Hence we can invoke either with the command line.
gcloud scheduler jobs run scheduledDatastoreExport — project <gcp-project>
Or
gcloud pubsub topics publish datastorebackup — project <gcp-project> \
— message=’{ “bucket”: “gs://datastore-backup-<gcp-project>”,”namespaceIds”: [“”, “namespace1”]}’
Troubleshoot access issues
Check logs
Go to logs path: https://console.cloud.google.com/logs/viewer?project=<gcp-project>
And add below advanced filter to check how cloud functions performing.
resource.type = “cloud_function”
resource.labels.function_name = “datastore_export”
resource.labels.region = “us-central1”
severity>=DEFAULT
The Cloud functions use the project’s appspot service account to perform operations. Most cases they would have required access, in case you find issues with logs on permission add below permissions.
gcloud projects add-iam-policy-binding <gcp-project> \
— member serviceAccount:<gcp-project>@appspot.gserviceaccount.com \
— role roles/datastore.importExportAdmin
gsutil iam ch serviceAccount:<gcp-project>@appspot.gserviceaccount.com:admin \
gs://datastore-backup-<gcp-project>
Get job progress
Job progress can be checked at datastore
# Check status of backup
gcloud datastore operations list — project <gcp-project>
# Check final data at storage
gsutil ls gs://datastore-backup-<gcp-project>
Note: Replace project name, storage path for the backup of the data.