This article assumes you have already deployed CockroachDB on a single Kubernetes cluster.
This page explains how to add and remove CockroachDB nodes on Kubernetes.
All kubectl steps should be performed in the namespace where you installed the Operator. By default, this is cockroach-operator-system.
If you deployed CockroachDB on Red Hat OpenShift, substitute kubectl with oc in the following commands.
Add nodes
Before scaling up CockroachDB, note the following topology recommendations:
- Each CockroachDB node (running in its own pod) should run on a separate Kubernetes worker node.
- Each availability zone should have the same number of CockroachDB nodes.
If your cluster has 3 CockroachDB nodes distributed across 3 availability zones (as in our deployment example), we recommend scaling up by a multiple of 3 to retain an even distribution of nodes. You should therefore scale up to a minimum of 6 CockroachDB nodes, with 2 nodes in each zone.
Run
kubectl get nodesto list the worker nodes in your Kubernetes cluster. There should be at least as many worker nodes as pods you plan to add. This ensures that no more than one pod will be placed on each worker node.If you need to add worker nodes, resize your GKE cluster by specifying the desired number of worker nodes in each zone:
gcloud container clusters resize {cluster-name} --region {region-name} --num-nodes 2This example distributes 2 worker nodes across the default 3 zones, raising the total to 6 worker nodes.
If you are adding nodes after previously scaling down, and have not enabled automatic PVC pruning, you must first manually delete any persistent volumes that were orphaned by node removal.
Note:Due to a known issue, automatic pruning of PVCs is currently disabled by default. This means that after decommissioning and removing a node, the Operator will not remove the persistent volume that was mounted to its pod.
View the PVCs on the cluster:
kubectl get pvcNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-cockroachdb-0 Bound pvc-f1ce6ed2-ceda-40d2-8149-9e5b59faa9df 60Gi RWO standard 24m datadir-cockroachdb-1 Bound pvc-308da33c-ec77-46c7-bcdf-c6e610ad4fea 60Gi RWO standard 24m datadir-cockroachdb-2 Bound pvc-6816123f-29a9-4b86-a4e2-b67f7bb1a52c 60Gi RWO standard 24m datadir-cockroachdb-3 Bound pvc-63ce836a-1258-4c58-8b37-d966ed12d50a 60Gi RWO standard 24m datadir-cockroachdb-4 Bound pvc-1ylabv86-6512-6n12-bw3g-i0dh2zxvfhd0 60Gi RWO standard 24m datadir-cockroachdb-5 Bound pvc-2vka2c9x-7824-41m5-jk45-mt7dzq90q97x 60Gi RWO standard 24mThe PVC names correspond to the pods they are bound to. For example, if the pods
cockroachdb-3,cockroachdb-4, andcockroachdb-5had been removed by scaling the cluster down from 6 to 3 nodes,datadir-cockroachdb-3,datadir-cockroachdb-4, anddatadir-cockroachdb-5would be the PVCs for the orphaned persistent volumes. To verify that a PVC is not currently bound to a pod:kubectl describe pvc datadir-cockroachdb-5The output will include the following line:
Mounted By: <none>If the PVC is bound to a pod, it will specify the pod name.
Remove the orphaned persistent volumes by deleting their PVCs:
Warning:Before deleting any persistent volumes, be sure you have a backup copy of your data. Data cannot be recovered once the persistent volumes are deleted. For more information, see the Kubernetes documentation.
kubectl delete pvc datadir-cockroachdb-3 datadir-cockroachdb-4 datadir-cockroachdb-5persistentvolumeclaim "datadir-cockroachdb-3" deleted persistentvolumeclaim "datadir-cockroachdb-4" deleted persistentvolumeclaim "datadir-cockroachdb-5" deleted
Update
nodesin the Operator's custom resource, which you downloaded when deploying the cluster, with the target size of the CockroachDB cluster. This value refers to the number of CockroachDB nodes, each running in one pod:nodes: 6Note:Note that you must scale by updating the
nodesvalue in the custom resource. Usingkubectl scale statefulset <cluster-name> --replicas=4will result in new pods immediately being terminated.Apply the new settings to the cluster:
$ kubectl apply -f example.yamlVerify that the new pods were successfully started:
kubectl get podsNAME READY STATUS RESTARTS AGE cockroach-operator-655fbf7847-zn9v8 1/1 Running 0 30m cockroachdb-0 1/1 Running 0 24m cockroachdb-1 1/1 Running 0 24m cockroachdb-2 1/1 Running 0 24m cockroachdb-3 1/1 Running 0 30s cockroachdb-4 1/1 Running 0 30s cockroachdb-5 1/1 Running 0 30sEach pod should be running in one of the 6 worker nodes.
Before scaling up CockroachDB, note the following topology recommendations:
- Each CockroachDB node (running in its own pod) should run on a separate Kubernetes worker node.
- Each availability zone should have the same number of CockroachDB nodes.
If your cluster has 3 CockroachDB nodes distributed across 3 availability zones (as in our deployment example), we recommend scaling up by a multiple of 3 to retain an even distribution of nodes. You should therefore scale up to a minimum of 6 CockroachDB nodes, with 2 nodes in each zone.
Run
kubectl get nodesto list the worker nodes in your Kubernetes cluster. There should be at least as many worker nodes as pods you plan to add. This ensures that no more than one pod will be placed on each worker node.Add worker nodes if necessary:
On GKE, resize your cluster. If you deployed a regional cluster as we recommended, you will use
--num-nodesto specify the desired number of worker nodes in each zone. For example:gcloud container clusters resize {cluster-name} --region {region-name} --num-nodes 2On EKS, resize your Worker Node Group.
On GCE, resize your Managed Instance Group.
On AWS, resize your Auto Scaling Group.
Edit your StatefulSet configuration to add pods for each new CockroachDB node:
$ kubectl scale statefulset cockroachdb --replicas=6statefulset.apps/cockroachdb scaledVerify that the new pod started successfully:
$ kubectl get podsNAME READY STATUS RESTARTS AGE cockroachdb-0 1/1 Running 0 51m cockroachdb-1 1/1 Running 0 47m cockroachdb-2 1/1 Running 0 3m cockroachdb-3 1/1 Running 0 1m cockroachdb-4 1/1 Running 0 1m cockroachdb-5 1/1 Running 0 1m cockroachdb-client-secure 1/1 Running 0 15m ...You can also open the Node List in the DB Console to ensure that the fourth node successfully joined the cluster.
Before scaling CockroachDB, ensure that your Kubernetes cluster has enough worker nodes to host the number of pods you want to add. This is to ensure that two pods are not placed on the same worker node, as recommended in our production guidance.
For example, if you want to scale from 3 CockroachDB nodes to 4, your Kubernetes cluster should have at least 4 worker nodes. You can verify the size of your Kubernetes cluster by running kubectl get nodes.
Edit your StatefulSet configuration to add another pod for the new CockroachDB node:
$ helm upgrade \ my-release \ cockroachdb/cockroachdb \ --set statefulset.replicas=4 \ --reuse-valuesRelease "my-release" has been upgraded. Happy Helming! LAST DEPLOYED: Tue May 14 14:06:43 2019 NAMESPACE: default STATUS: DEPLOYED RESOURCES: ==> v1beta1/PodDisruptionBudget NAME AGE my-release-cockroachdb-budget 51m ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE my-release-cockroachdb-0 1/1 Running 0 38m my-release-cockroachdb-1 1/1 Running 0 39m my-release-cockroachdb-2 1/1 Running 0 39m my-release-cockroachdb-3 0/1 Pending 0 0s my-release-cockroachdb-init-nwjkh 0/1 Completed 0 39m ...Get the name of the
PendingCSR for the new pod:$ kubectl get csrNAME AGE REQUESTOR CONDITION default.client.root 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-0 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-1 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-2 1h system:serviceaccount:default:default Approved,Issued default.node.my-release-cockroachdb-3 2m system:serviceaccount:default:default Pending node-csr-0Xmb4UTVAWMEnUeGbW4KX1oL4XV_LADpkwjrPtQjlZ4 1h kubelet Approved,Issued node-csr-NiN8oDsLhxn0uwLTWa0RWpMUgJYnwcFxB984mwjjYsY 1h kubelet Approved,Issued node-csr-aU78SxyU69pDK57aj6txnevr7X-8M3XgX9mTK0Hso6o 1h kubelet Approved,Issued ...If you do not see a
PendingCSR, wait a minute and try again.Examine the CSR for the new pod:
$ kubectl describe csr default.node.my-release-cockroachdb-3Name: default.node.my-release-cockroachdb-3 Labels: <none> Annotations: <none> CreationTimestamp: Thu, 09 Nov 2017 13:39:37 -0500 Requesting User: system:serviceaccount:default:default Status: Pending Subject: Common Name: node Serial Number: Organization: Cockroach Subject Alternative Names: DNS Names: localhost my-release-cockroachdb-1.my-release-cockroachdb.default.svc.cluster.local my-release-cockroachdb-1.my-release-cockroachdb my-release-cockroachdb-public my-release-cockroachdb-public.default.svc.cluster.local IP Addresses: 127.0.0.1 10.48.1.6 Events: <none>If everything looks correct, approve the CSR for the new pod:
$ kubectl certificate approve default.node.my-release-cockroachdb-3certificatesigningrequest.certificates.k8s.io/default.node.my-release-cockroachdb-3 approvedVerify that the new pod started successfully:
$ kubectl get podsNAME READY STATUS RESTARTS AGE my-release-cockroachdb-0 1/1 Running 0 51m my-release-cockroachdb-1 1/1 Running 0 47m my-release-cockroachdb-2 1/1 Running 0 3m my-release-cockroachdb-3 1/1 Running 0 1m cockroachdb-client-secure 1/1 Running 0 15m ...You can also open the Node List in the DB Console to ensure that the fourth node successfully joined the cluster.
Remove nodes
Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on CockroachDB and will cause errors.
Due to a known issue, automatic pruning of PVCs is currently disabled by default. This means that after decommissioning and removing a node, the Operator will not remove the persistent volume that was mounted to its pod.
If you plan to eventually scale up the cluster after scaling down, you will need to manually delete any PVCs that were orphaned by node removal before scaling up. For more information, see Add nodes.
If you want to enable the Operator to automatically prune PVCs when scaling down, see Automatic PVC pruning. However, note that this workflow is currently unsupported.
Before scaling down CockroachDB, note the following topology recommendation:
- Each availability zone should have the same number of CockroachDB nodes.
If your nodes are distributed across 3 availability zones (as in our deployment example), we recommend scaling down by a multiple of 3 to retain an even distribution. If your cluster has 6 CockroachDB nodes, you should therefore scale down to 3, with 1 node in each zone.
Update
nodesin the custom resource, which you downloaded when deploying the cluster, with the target size of the CockroachDB cluster. For instance, to scale down to 3 nodes:nodes: 3Note:Before removing a node, the Operator first decommissions the node. This lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.
Apply the new settings to the cluster:
$ kubectl apply -f example.yamlThe Operator will remove nodes from the cluster one at a time, starting from the pod with the highest number in its address.
Verify that the pods were successfully removed:
kubectl get podsNAME READY STATUS RESTARTS AGE cockroach-operator-655fbf7847-zn9v8 1/1 Running 0 32m cockroachdb-0 1/1 Running 0 26m cockroachdb-1 1/1 Running 0 26m cockroachdb-2 1/1 Running 0 26m
Automatic PVC pruning
To enable the Operator to automatically remove persistent volumes when scaling down a cluster, turn on automatic PVC pruning through a feature gate.
This workflow is unsupported and should be enabled at your own risk.
Download the Operator manifest:
$ curl -0 https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.18.2/install/operator.yamlUncomment the following lines in the Operator manifest:
- feature-gates - AutoPrunePVC=trueReapply the Operator manifest:
$ kubectl apply -f operator.yamlValidate that the Operator is running:
$ kubectl get podsNAME READY STATUS RESTARTS AGE cockroach-operator-6f7b86ffc4-9ppkv 1/1 Running 0 22s ...
Before removing a node from your cluster, you must first decommission the node. This lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.
If you remove nodes without first telling CockroachDB to decommission them, you may cause data or even cluster unavailability. For more details about how this works and what to consider before removing nodes, see Prepare for graceful shutdown.
Use the
cockroach node statuscommand to get the internal IDs of nodes. For example, if you followed the steps in Deploy CockroachDB with Kubernetes to launch a secure client pod, get a shell into thecockroachdb-client-securepod:$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node status \ --certs-dir=/cockroach-certs \ --host=cockroachdb-publicid | address | build | started_at | updated_at | is_available | is_live +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+ 1 | cockroachdb-0.cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true | true 2 | cockroachdb-2.cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true | true 3 | cockroachdb-1.cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true | true 4 | cockroachdb-3.cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true | true (4 rows)The pod uses the
rootclient certificate created earlier to initialize the cluster, so there's no CSR approval required.Use the
cockroach node decommissioncommand to decommission the node with the highest number in its address, specifying its ID (in this example, node ID4because its address iscockroachdb-3):Note:You must decommission the node with the highest number in its address. Kubernetes will remove the pod for the node with the highest number in its address when you reduce the replica count.
$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node decommission 4 \ --certs-dir=/cockroach-certs \ --host=cockroachdb-publicYou'll then see the decommissioning status print to
stderras it changes:id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 73 | true | decommissioning | falseOnce the node has been fully decommissioned, you'll see a confirmation:
id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 0 | true | decommissioning | false (1 row) No more data reported on target nodes. Please verify cluster health before removing the nodes.Once the node has been decommissioned, scale down your StatefulSet:
$ kubectl scale statefulset cockroachdb --replicas=3statefulset.apps/cockroachdb scaledVerify that the pod was successfully removed:
$ kubectl get podsNAME READY STATUS RESTARTS AGE cockroachdb-0 1/1 Running 0 51m cockroachdb-1 1/1 Running 0 47m cockroachdb-2 1/1 Running 0 3m cockroachdb-client-secure 1/1 Running 0 15m ...You should also remove the persistent volume that was mounted to the pod. Get the persistent volume claims for the volumes:
$ kubectl get pvcNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-cockroachdb-0 Bound pvc-75dadd4c-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-cockroachdb-1 Bound pvc-75e143ca-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-cockroachdb-2 Bound pvc-75ef409a-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-cockroachdb-3 Bound pvc-75e561ba-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17mVerify that the PVC with the highest number in its name is no longer mounted to a pod:
$ kubectl describe pvc datadir-cockroachdb-3Name: datadir-cockroachdb-3 ... Mounted By: <none>Remove the persistent volume by deleting the PVC:
$ kubectl delete pvc datadir-cockroachdb-3persistentvolumeclaim "datadir-cockroachdb-3" deleted
Before removing a node from your cluster, you must first decommission the node. This lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.
If you remove nodes without first telling CockroachDB to decommission them, you may cause data or even cluster unavailability. For more details about how this works and what to consider before removing nodes, see Prepare for graceful shutdown.
Use the
cockroach node statuscommand to get the internal IDs of nodes. For example, if you followed the steps in Deploy CockroachDB with Kubernetes to launch a secure client pod, get a shell into thecockroachdb-client-securepod:$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node status \ --certs-dir=/cockroach-certs \ --host=my-release-cockroachdb-publicid | address | build | started_at | updated_at | is_available | is_live +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+ 1 | my-release-cockroachdb-0.my-release-cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true | true 2 | my-release-cockroachdb-2.my-release-cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true | true 3 | my-release-cockroachdb-1.my-release-cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true | true 4 | my-release-cockroachdb-3.my-release-cockroachdb.default.svc.cluster.local:26257 | v23.2.28 | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true | true (4 rows)The pod uses the
rootclient certificate created earlier to initialize the cluster, so there's no CSR approval required.Use the
cockroach node decommissioncommand to decommission the node with the highest number in its address, specifying its ID (in this example, node ID4because its address ismy-release-cockroachdb-3):Note:You must decommission the node with the highest number in its address. Kubernetes will remove the pod for the node with the highest number in its address when you reduce the replica count.
$ kubectl exec -it cockroachdb-client-secure \ -- ./cockroach node decommission 4 \ --certs-dir=/cockroach-certs \ --host=my-release-cockroachdb-publicYou'll then see the decommissioning status print to
stderras it changes:id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 73 | true | decommissioning | falseOnce the node has been fully decommissioned, you'll see a confirmation:
id | is_live | replicas | is_decommissioning | membership | is_draining -----+---------+----------+--------------------+-----------------+-------------- 4 | true | 0 | true | decommissioning | false (1 row) No more data reported on target nodes. Please verify cluster health before removing the nodes.Once the node has been decommissioned, scale down your StatefulSet:
$ helm upgrade \ my-release \ cockroachdb/cockroachdb \ --set statefulset.replicas=3 \ --reuse-valuesVerify that the pod was successfully removed:
$ kubectl get podsNAME READY STATUS RESTARTS AGE my-release-cockroachdb-0 1/1 Running 0 51m my-release-cockroachdb-1 1/1 Running 0 47m my-release-cockroachdb-2 1/1 Running 0 3m cockroachdb-client-secure 1/1 Running 0 15m ...You should also remove the persistent volume that was mounted to the pod. Get the persistent volume claims for the volumes:
$ kubectl get pvcNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-my-release-cockroachdb-0 Bound pvc-75dadd4c-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-my-release-cockroachdb-1 Bound pvc-75e143ca-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-my-release-cockroachdb-2 Bound pvc-75ef409a-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17m datadir-my-release-cockroachdb-3 Bound pvc-75e561ba-01a1-11ea-b065-42010a8e00cb 100Gi RWO standard 17mVerify that the PVC with the highest number in its name is no longer mounted to a pod:
$ kubectl describe pvc datadir-my-release-cockroachdb-3Name: datadir-my-release-cockroachdb-3 ... Mounted By: <none>Remove the persistent volume by deleting the PVC:
$ kubectl delete pvc datadir-my-release-cockroachdb-3persistentvolumeclaim "datadir-my-release-cockroachdb-3" deleted