Databases on Kubernetes (Part 1)

In part 1 we introduce the concept of databases on Kubernetes and the challenges that come with it

10th April 2024

6 min read

Cloud Databases Kubernetes

David Hazra

David Hazra is a professional software developer based in London

Caption: Databases on Kubernetes

Databases on Kubernetes

Running databases on kubernetes hasn't always been the go-to solution, stateful applications in general have traditionally been difficult to run smoothly in environments like kubernetes where it should be expected for a node (or a few nodes) to unexpectedly become unavailable.

However things have changed, we now have traditional databases (which were initially designed for a single-node setup, making horisontal scaling not the default) evolve to perform well in cloud environments. We also have cloud-native databases which have been purpose built with scalibility in mind.

One of the great things about having database solutions in kubernetes is that you can pick and choose your database provider with ease. There is no rule that a cluster must only have a single database operator. In fact it is common practice for different applications who have different operational needs to deploy databases that are specific to their workload all in the same cluster. The difficulty being that each additional database provider requires domain-specific knowledge to maintain.

Kubernetes databases are usually deployed as CRDs for operators. This means that you deploy an operator in the cluster that enables you to create database CRDs which the operator reconciles into a functioning system. It provisions the PVCs, the statefulsets, everything you need.

Another important note is that a lot of databases have similar terminology to kubernetes. For example they have "Nodes" which represent a single instance of a database, but also represent a machine instance in kubernetes. There is also a cluster which represents a collection of nodes.

Traditional Databases

MySQL

✔️ GitHub | ✔️ Docs | ✔️ Official Operator | ✔️ Helm Chart

Operator Installation

The MySQL operator is an official operator made by Oracle, but it is fully open source. It can be installed most conviniently via helm:

helm repo add mysql-operator https://mysql.github.io/mysql-operator
helm repo update

helm install mysql-operator mysql-operator/mysql-operator --namespace mysql-operator --create-namespace

Database Creation

The underlying storage engine MySQL uses is InnoDB. Below we show a basic example of deploying this cluster to a kubernetes environment.

To create an InnoDB cluster, you first need to create an in-cluster user secret. In this instance the rootHost represents the list of hosts that the created user can connect from. Here the % represents a wildcard character.

apiVersion: v1
kind: Secret
metadata:
  name: user-innodb-creds
  namespace: mysql-database
stringData:
  rootUser: "username"
  rootHost: "%"
  rootPassword: "password"

After the operator pod has come up successfully, we can create the database cluster itself by applying the following resource. In this case tlsUseSelfSigned indicates the cluster should use self-signed TLS certificates for traffic between database nodes in the database cluster. instances is the number of database replicas to create, and router.instances sets the number of MySQL router replicas which are responsible for routing traffic to the correct database instance within the cluster.

apiVersion: mysql.oracle.com/v2
kind: InnoDBCluster
metadata:
  name: mycluster
spec:
  secretName: user-innodb-creds
  tlsUseSelfSigned: true
  instances: 3
  router:
    instances: 1

PostgreSQL

There are quite a few PostgreSQL operators available for kubernetes:

Zalando
CrunchyData
KubeDB (which offers several automated database-on-kubernetes solutions)
the list goes on...

We will be focusing on the Zalando operator, as it's open-source and has the most github stars by a fraction (I know this isn't the best metric!).

NOTE: We are biased here, as we use this operator the most in our applications, and have been happy with it's performance.

Zalando postgreSQL Operator

✔️ GitHub | ✔️ Docs | ✔️ Helm Chart

Operator Installation

To install the operator instelf we can use Helm.

helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
helm repo update

helm install postgres-operator postgres-operator-charts/postgres-operator -n postgres-operator --create-namespace

This operator comes with an optional nifty interface which can help create postgres cluster resources. It doesn't include all of the CRD parameters, but it can be useful:

# Install the optional operator UI
helm repo add postgres-operator-ui-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator-ui
helm repo update

helm install postgres-operator-ui postgres-operator-ui-charts/postgres-operator-ui -n postgres-operator-ui --create-namespace

# To access the UI locally (http://localhost:8081)
kubectl port-forward svc/postgres-operator-ui -n postgres-operator-ui 8081:8081

Database Creation

This postgresql cluster was created with the aide of the UI. It defines a postgresql 15 cluster with 1 replica, and provisions a persistent volume with 25Gi of storage (using the specified storage class). An admin-user is created with access to the database db. The operator will create a secret which contains the login information for this user.

kind: postgresql
apiVersion: acid.zalan.do/v1
metadata:
  name: postgresql-database
  namespace: database
  labels:
    team: acid
spec:
  teamId: acid
  postgresql:
    version: "15"
  numberOfInstances: 1
  volume:
    size: "25Gi"
    storageClass: "storage-class-name"
  users:
    admin-user: []
  databases:
    db: admin-user
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 500m
      memory: 500Mi

Cloud Native Databases

CockroachDB

✔️ GitHub | ✔️ Docs | ✔️ Official Operator

CockroachDB can be installed in 2 ways: as a kubernetes operator as we've seen in the other databases (this is the way recommended in their docs), or we can deploy a database as a self-contained helm chart. We will be covering the former.

It should also be warned that according to their documentation the operator has only been tested with the Google Kubernetes Engine (GKE). However, they also have instructions for working with Amazon's Elastic Kubernetes Service (EKS), so I'd assume that it works there too.

Operator Installation

The base operator is not installed via helm chart like other databases, but applied directly using kubectl:

# Install the CRDs
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.12.0/install/crds.yaml

# Install the operator itself
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.12.0/install/operator.yaml

Database Creation

Once the operator pod has come up and is working, we can create an example cluster by applying the following resource. In this case nodes refers to the number of database replicas to be created.

apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
  name: database
spec:
  dataStore:
    pvc:
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: "25Gi"
        volumeMode: Filesystem
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      cpu: 2
      memory: 8Gi
  tlsEnabled: true
  image:
    name: cockroachdb/cockroach:v23.1.11
  nodes: 3