Play with volume for statefulsets
Statefulsets are specific resources in Kubernetes & support handling databases too. Unlike deployments (stateless), it requires special treatments. In case you would like to increase space on your deployment, you simply edit the value & it will create a new pod with that configuration. However, statefulsets can’t work that way. Let’s explore simple steps.
Case 1: Increase disk space
In the real world, we may assign a small value to our databases at the start & later want it to increase. Let’s check a scenario where you have installed cortex to your cluster & now want to increase disk for stability.
Step 1: change configuration YAML
We use our terraform + helm pipeline to deploy cortex. Below change would set configuration & does not show plan change after we increase disk space with subsequent commands
# cat cortex-values.ymlstore_gateway:
serviceMonitor:
enabled: false
additionalLabels:
release: prom
resources:
limits:
cpu: 500m
memory: 1G
requests:
cpu: 500m
memory: 1G
persistentVolume:
size: 50Gi
nodeSelector:
service: ${nodeselector}
extraArgs:
log.level: error
Step 2: Manually increase PVC
As we already deployed statefulset, running the above command would not do any change to the disk sizes. If we want to retain existing data, we must manually increase size. To edit all existing PVC with new disk size. Wait for the new disk size shown
# kubectl edit PVC storage-cortex-store-gateway-0 -n cortexapiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-10-30-xx-xx.eu-west-1.compute.internal
volume.kubernetes.io/storage-resizer: kubernetes.io/aws-ebs
creationTimestamp: "2022-02-14T06:49:58Z"
finalizers:
- kubernetes.io/pvc-protection
labels:
app.kubernetes.io/component: store-gateway
app.kubernetes.io/instance: cortex
app.kubernetes.io/name: cortex
name: storage-cortex-store-gateway-0
namespace: cortex
resourceVersion: "155240334"
uid: 70522d12-aa01-496f-8xx2-9c5e0e6a3a3f
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: gp2
volumeMode: Filesystem
volumeName: pvc-70522d12-aa01-496f-8xx2-9c5e0e6a3a3f
Step 3: Remove existing sts leaving PVC
Now our disk space has increased. But the sts created do not know the correct size. To avoid conflict, we will remove sts with the simple command below:
kubectl delete sts — cascade=orphan cortex-store-gateway -n cortex
Step 4: Deploy statefulsets
As we have removed statefulsets in the last command. terraform plan shows the deployment of the same resources & it will find disk available hence utilises new disk & new configuration done in step 1.
terraform apply
Case 2: Move PVC to another zone
We had an interesting case where we had set up cortex in multiple zones, later due to high network usage across zones, we decided to keep it in a single zone. Moving to a single zone worked well. however, we missed volumes in another zone. So we used the below steps to ensure PVC migrates to a new zone in the same region.
Problem statement
As mentioned we can see that after changing zone for my cortex configuration, my pod were unable to schedule.
kubectl get po -n cortex -o wide | grep store
cortex-store-gateway-0 0/1 Pending 0 19h <none> <none> <none> <none>
The event logs show that we are not able to mount the volume due to zone bindings.
kubectl get events -n cortexLAST SEEN TYPE REASON OBJECT MESSAGE
2m52s Warning FailedScheduling pod/cortex-compactor-1 0/17 nodes are available: 10 node(s) didn't match Pod's node affinity, 3 node(s) had volume node affinity conflict, 4 node(s) had taint {dedicated: platform}, that the pod didn't tolerate.
Step 1: Get Volume Details
We can find existing PV & PVC in the statefulset deployment.
export PVC="storage-cortex-store-gateway-0"
export OLD_PV=`kubectl get pvc -n cortex $PVC -o=jsonpath='{.spec.volumeName}'`
export OLD_VOL=`kubectl get pv $OLD_PV -o=jsonpath='{.spec.awsElasticBlockStore.volumeID}' | cut -d'/' -f 4`
export OLD_ZONE=`kubectl get pv $OLD_PV -o=jsonpath='{.spec.awsElasticBlockStore.volumeID}' | cut -d'/' -f 3`export NEW_ZONE="eu-west-1a"
echo $OLD_PV
echo $OLD_VOL
echo $OLD_ZONE
Step 2: Create snapshot
We have to migrate volume manually to the new zone. So let’s create snapshot.
export VOL_SNAPSHOT=`aws ec2 create-snapshot --description "Migrating Kubernetes PV" --volume-id $OLD_VOL | jq '.SnapshotId'`
We can wait for some time or run snapshot status to see it’s completed
sleep(100)
or
aws ec2 describe-snapshots --snapshot-ids $VOL_SNAPSHOT
Step 3: Create volume from the snapshot
As we are creating all manually, we have to create volume manually from the snapshot.
export NEW_VOL=`aws ec2 create-volume --snapshot-id $VOL_SNAPSHOT --availability-zone $NEW_ZONE | jq '.VolumeId'`
Step 4: Create new configuration files
We need to modify a few fields zone & create Kubernetes YAML files. The command looks lengthy. However, it only removes cluttered information not required for future commands.
kubectl get pv $OLD_PV -n cortex -o=json | jq 'del(.metadata.resourceVersion,.metadata.uid,.metadata.selfLink,.metadata.creationTimestamp,.metadata.annotations,.metadata.generation,.metadata.ownerReferences)' | yq eval - -P > original-pv.yamlkubectl get pvc $PVC -n cortex -o=json | jq 'del(.metadata.resourceVersion,.metadata.uid,.metadata.selfLink,.metadata.creationTimestamp,.metadata.annotations,.metadata.generation,.metadata.ownerReferences)' | yq eval - -P > original-pvc.yamlcat original-pv.yaml | sed -e 's/$OLD_VOL/$NEW_VOL/g' | sed -e 's/$OLD_ZONE/$NEW_ZONE/g' > pv.yaml
Step 5: Remove old PVC & apply configuration
Our all manual work has been done. Now is the time we actually remove resources & recreate PV & PVC manually.
kubectl delete pvc -n cortex $PVC
kubectl apply -f pv.yaml
kubectl apply -f original-pvc.yaml
Fun facts
Kubernetes world has given the impression that everything can be handled with simple YAML change. Example like this shows there are yet manual interventions needed :) I am sure future versions may overcome such cases too.
Reference: