PGO: The Postgres Operator from Crunchy Data

PostgreSQL operators make it easy to set up all the building blocks needed to run a redundant high availability setup with the added benefit of higher read request capacity.

When setting up services that handle redundancy and high availability themselves the operator pattern emerged as most useful. In short an operator is a piece of software which sole purpose is to setup another piece of software but in a way consistent with other Kubernetes resources and most probably using the redundancy and high availability e. g. when it comes to storage built into the software instead of relying on Kubernetes' services for that.

For PostgreSQL at least two operators exist and are maintained. One is the postgres-operator by zalando and the other is PGO: The Postgres Operator by crunchydata

As the whole businsess modell of crunchydata is built around PostgreSQL services and it seems a bit more mature I set up this operator.

One pecularity of PGO is that it comes with its own API endpoint and its own cli client pgo. It also provides a CRD based setup nowadays but the client allows for common operations to be written in the cli instead of writing a more verbose YAML.

Unfortunately for shared DB usage the setup has quite some strange settings which point to the original intention of just having all DB users run pgo to create a lot of independent DBs per actual client.

For example pgo has no means of creating a super user/admin user. This has to be done on the command line of one of the pods it creates.

Following the quickstart instructions sets up the operator in a meaningful way. Nevertheless a few changes to the default configuration make it fit better on our cluster.

For the full YAML that sets up the operator see below.

    pgo_apiserver_url: "https://pgo.cluster.siam.homeunix.net"
[...]
    storage5_name: "localfast"
    storage5_access_mode: "ReadWriteOnce"
    storage5_size: "1Gi"
    storage5_type: "dynamic"
    storage5_class: "local-ssd"
    storage6_name: "local"
    storage6_access_mode: "ReadWriteOnce"
    storage6_size: "4G"
    storage6_type: "dynamic"
    storage6_class: "local-hdd"

Unfortunately the API endpoint uses a non standard certificate to encrypt the traffic. The usual ingress is of no use here as it can not forward the traffic correctly. All browsers will complain about the certificate, the pgo client wont.

There is however the possibility to instruct the default nginx ingress to just pass on any traffic without looking at it. There seem to be some (small) performance implications so this feature is not enabled by default.

We need to change the YAML defining out cluster and add

rancher_kubernetes_engine_config:
  ingress:
    extra_args:
      enable-ssl-passthrough: ""

Now we can add the following annotation

nginx.ingress.kubernetes.io/ssl-passthrough=true

to the pgo API ingress so it will just pass on all traffic and the API endpoint can present it's certificate.

The following environment variables should be set to use pgo:

export PGO_NAMESPACE=default-postgres
export PGOUSER=/c/Users/simar/.pgo/pgo/pgouser
export PGO_CA_CERT=/c/Users/simar/.pgo/pgo/client.crt
export PGO_CLIENT_CERT=/c/Users/simar/.pgo/pgo/client.crt
export PGO_CLIENT_KEY=/c/Users/simar/.pgo/pgo/client.key
export PGO_APISERVER_URL=https://pgo.cluster.siam.homeunix.net

See below for a script that fetches the files needed.

PGO can use a number of different kinds of storage for different purposes:

There are a few differences in using NFS data in comparison to fully managed storage:

If something goes wrong during setup the quickest way to bring the NFS storage into ready state again coming from released is to just remove the claimRef that points to the now deleted pods

kubectl patch pv nfs-backup2 -p '{"spec":{"claimRef": null}}'

or recreate it.

Also pgbackrest, the component using the backup persistent volumes on nfs, expects group 65534 (nogroup) to have write access on share. We have to set this up manually:

Namespaces that should contain PGO managed PostgreSQL databases need to be registerd using pgo update namespace or created using pgo create namespace.

To create a new PostgreSQL server in cluster mode use a command like this:

pgo update namespace shared-dbs
pgo create cluster\
  --namespace shared-dbs\
  --storage-config local --replica-storage-config local\
  --wal-storage-config localfast\
  --pgbackrest-storage-config nfsstorage\
  --replica-count 2\
  --pgbouncer --pgbadger\
  --label id=9999\
  postgres

This creates a cluster with 2 replicas and 1 master so all three nodes should be used. Replicas can be scaled up if need be to accelerate read operations. They can also be scaled down.

pgbouncer distributes requests beween master or replica and makes this setup transparent to any client.

We can expose pgbouncer on nodeports using the postgres default port 5432. Edit the respective service, the setting is in “Advanced Options”.

pgbadger collects metrics.

Sometimes the operator does start replicas on the same node as the master. In this case cordon the two nodes containing active postgres instances and scale the cluster again with e. g.

pgo scale -n shared-dbs --replica-count 1 postgres

Replica count means the additional replica instances to be created. Uncordon the two nodes cordoned above and remove a replica:

pgo scaledown -n shared-dbs --query postgres
pgo scaledown -n shared-dbs --target postgres-xxxx postgres

This queries the names and nodes of replicas and then we remove one of those that are on a node with some other postgres instance.

We also create a pgadmin web interface using

pgo create -n shared-dbs pgadmin postgres

pgadmin setup does not include an ingress, we can add this using the rancher UI.

We need to invoke certmanager ourselves.

apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: pgadmin-siam-homeunix-net
  namespace: default-postgres
spec:
  secretName: pgadmin-homeunix-tls
  dnsNames:
  - pgadmin.cluster.siam.homeunix.net
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - pgadmin.cluster.siam.homeunix.net
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

By default PGO only creates a testuser with restricted access rights to one table. For a shared instance an admin or superuser is needed. To create this superuser we need to run pgo. Like this the user is synced between postgres and pgAdmin.

pgo -n shared-dbs create user --username admin --managed postgres

After that open a shell for the current master db (usually “postgres”). This already is a shell for the usual “postgres” user. We can launch psql and execute

ALTER ROLE admin SUPERUSER;

This “admin” user can create users and databases for client applications. It cannot create other pgAdmin users, they wont be able to log in.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: pgo-deployer-sa
  namespace: pgo
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: pgo-deployer-cr
rules:
  - apiGroups:
      - ''
    resources:
      - namespaces
    verbs:
      - get
      - list
      - create
      - patch
      - delete
  - apiGroups:
      - ''
    resources:
      - pods
    verbs:
      - list
  - apiGroups:
      - ''
    resources:
      - secrets
    verbs:
      - list
      - get
      - create
      - delete
  - apiGroups:
      - ''
    resources:
      - configmaps
      - services
      - persistentvolumeclaims
    verbs:
      - get
      - create
      - delete
      - list
  - apiGroups:
      - ''
    resources:
      - serviceaccounts
    verbs:
      - get
      - create
      - delete
      - patch
      - list
  - apiGroups:
      - apps
      - extensions
    resources:
      - deployments
      - replicasets
    verbs:
      - get
      - list
      - watch
      - create
      - delete
  - apiGroups:
      - apiextensions.k8s.io
    resources:
      - customresourcedefinitions
    verbs:
      - get
      - create
      - delete
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterroles
      - clusterrolebindings
      - roles
      - rolebindings
    verbs:
      - get
      - create
      - delete
      - bind
      - escalate
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - roles
    verbs:
      - create
      - delete
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - delete
      - list
  - apiGroups:
      - crunchydata.com
    resources:
      - pgclusters
      - pgreplicas
      - pgpolicies
      - pgtasks
    verbs:
      - delete
      - list
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: pgo-deployer-cm
  namespace: pgo
data:
  values.yaml: |-
    # =====================
    # Configuration Options
    # More info for these options can be found in the docs
    # https://access.crunchydata.com/documentation/postgres-operator/latest/installation/configuration/
    # =====================
    archive_mode: "true"
    archive_timeout: "60"
    backrest_aws_s3_bucket: ""
    backrest_aws_s3_endpoint: ""
    backrest_aws_s3_key: ""
    backrest_aws_s3_region: ""
    backrest_aws_s3_secret: ""
    backrest_aws_s3_uri_style: ""
    backrest_aws_s3_verify_tls: "true"
    backrest_gcs_bucket: ""
    backrest_gcs_endpoint: ""
    backrest_gcs_key_type: ""
    backrest_port: "2022"
    badger: "false"
    ccp_image_prefix: "registry.developers.crunchydata.com/crunchydata"
    ccp_image_pull_secret: ""
    ccp_image_pull_secret_manifest: ""
    ccp_image_tag: "centos8-13.3-4.7.0"
    create_rbac: "true"
    crunchy_debug: "false"
    db_name: ""
    db_password_age_days: "0"
    db_password_length: "24"
    db_port: "5432"
    db_replicas: "0"
    db_user: "testuser"
    default_instance_memory: "128Mi"
    default_pgbackrest_memory: "48Mi"
    default_pgbouncer_memory: "24Mi"
    default_exporter_memory: "24Mi"
    delete_operator_namespace: "false"
    delete_watched_namespaces: "false"
    disable_auto_failover: "false"
    disable_fsgroup: "false"
    reconcile_rbac: "true"
    exporterport: "9187"
    metrics: "false"
    namespace: "pgo"
    namespace_mode: "dynamic"
    pgbadgerport: "10000"
    pgo_add_os_ca_store: "false"
    pgo_admin_password: "examplepassword"
    pgo_admin_perms: "*"
    pgo_admin_role_name: "pgoadmin"
    pgo_admin_username: "admin"
    pgo_apiserver_port: "8443"
    pgo_apiserver_url: "https://pgo.cluster.siam.homeunix.net"
    pgo_client_cert_secret: "pgo.tls"
    pgo_client_container_install: "false"
    pgo_client_install: "true"
    pgo_client_version: "4.7.0"
    pgo_cluster_admin: "false"
    pgo_disable_eventing: "false"
    pgo_disable_tls: "false"
    pgo_image_prefix: "registry.developers.crunchydata.com/crunchydata"
    pgo_image_pull_secret: ""
    pgo_image_pull_secret_manifest: ""
    pgo_image_tag: "centos8-4.7.0"
    pgo_installation_name: "devtest"
    pgo_noauth_routes: ""
    pgo_operator_namespace: "pgo"
    pgo_tls_ca_store: ""
    pgo_tls_no_verify: "false"
    pod_anti_affinity: "preferred"
    pod_anti_affinity_pgbackrest: ""
    pod_anti_affinity_pgbouncer: ""
    scheduler_timeout: "3600"
    service_type: "ClusterIP"
    sync_replication: "false"
    backrest_storage: "default"
    backup_storage: "default"
    primary_storage: "default"
    replica_storage: "default"
    pgadmin_storage: "default"                          
    wal_storage: ""
    storage1_name: "default"
    storage1_access_mode: "ReadWriteOnce"
    storage1_size: "1G"
    storage1_type: "dynamic"
    storage2_name: "hostpathstorage"
    storage2_access_mode: "ReadWriteMany"
    storage2_size: "1G"
    storage2_type: "create"
    storage3_name: "nfsstorage"
    storage3_access_mode: "ReadWriteMany"
    storage3_size: "1G"
    storage3_type: "create"
    storage3_supplemental_groups: "65534"
    storage4_name: "nfsstoragered"
    storage4_access_mode: "ReadWriteMany"
    storage4_size: "1G"
    storage4_match_labels: "crunchyzone=red"
    storage4_type: "create"
    storage4_supplemental_groups: "65534"
    storage5_name: "storageos"
    storage5_access_mode: "ReadWriteOnce"
    storage5_size: "5Gi"
    storage5_type: "dynamic"
    storage5_class: "fast"
    storage6_name: "local"
    storage6_access_mode: "ReadWriteOnce"
    storage6_size: "4G"
    storage6_type: "dynamic"
    storage6_class: "local-hdd"
    storage7_name: "alternatesite"
    storage7_access_mode: "ReadWriteOnce"
    storage7_size: "4G"
    storage7_type: "dynamic"
    storage7_class: "alternatesite"
    storage8_name: "gce"
    storage8_access_mode: "ReadWriteOnce"
    storage8_size: "300M"
    storage8_type: "dynamic"
    storage8_class: "standard"
    storage9_name: "rook"
    storage9_access_mode: "ReadWriteOnce"
    storage9_size: "1Gi"
    storage9_type: "dynamic"
    storage9_class: "rook-ceph-block"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pgo-deployer-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: pgo-deployer-cr
subjects:
  - kind: ServiceAccount
    name: pgo-deployer-sa
    namespace: pgo
---
apiVersion: batch/v1
kind: Job
metadata:
  name: pgo-deploy
  namespace: pgo
spec:
  backoffLimit: 0
  template:
    metadata:
      name: pgo-deploy
    spec:
      serviceAccountName: pgo-deployer-sa
      restartPolicy: Never
      containers:
        - name: pgo-deploy
          image: registry.developers.crunchydata.com/crunchydata/pgo-deployer:centos8-4.7.0
          imagePullPolicy: IfNotPresent
          env:
            - name: DEPLOY_ACTION
              value: install
          volumeMounts:
            - name: deployer-conf
              mountPath: "/conf"
      volumes:
        - name: deployer-conf
          configMap:
            name: pgo-deployer-cm

Script to automatically get all files needed for pgo to work from the cluster after setup. Patched to work in Git bash too.

#!/bin/bash
 
#  Copyright 2020 - 2021 Crunchy Data Solutions, Inc.
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#       http://www.apache.org/licenses/LICENSE-2.0
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
 
# This script should be run after the operator has been deployed
PGO_OPERATOR_NAMESPACE="${PGO_OPERATOR_NAMESPACE:-pgo}"
PGO_USER_ADMIN="${PGO_USER_ADMIN:-pgouser-admin}"
PGO_CLIENT_VERSION="${PGO_CLIENT_VERSION:-v4.6.2}"
PGO_CLIENT_URL="https://github.com/CrunchyData/postgres-operator/releases/download/${PGO_CLIENT_VERSION}"
 
PGO_CMD="${PGO_CMD-kubectl}"
 
# Checks operating system and determines which binary to download
UNAME_RESULT=$(uname)
if [[ "${UNAME_RESULT}" == "Linux" ]]
then
    BIN_NAME="pgo"
elif [[ "${UNAME_RESULT}" == "Darwin" ]]
then
    BIN_NAME="pgo-mac"
elif [[ "${UNAME_RESULT}" =~ "MINGW64_NT-10" ]]
then
    BIN_NAME="pgo.exe"
else
    echo "${UNAME_RESULT} is not supported, valid operating systems are: Linux, Darwin"
    echo "Exiting..."
    exit 1
fi
 
# Creates the output directory for files
OUTPUT_DIR="${HOME}/.pgo/${PGO_OPERATOR_NAMESPACE}"
install -d -m a-rwx,u+rwx "${OUTPUT_DIR}"
 
if [ -f "${OUTPUT_DIR}/pgo" ]
then
	echo "pgo Client Binary detected at: ${OUTPUT_DIR}"
	echo "Updating Binary..."
fi
 
echo "Operating System found is ${UNAME_RESULT}..."
echo "Downloading ${BIN_NAME} version: ${PGO_CLIENT_VERSION}..."
curl -Lo "${OUTPUT_DIR}/pgo" "${PGO_CLIENT_URL}/${BIN_NAME}"
chmod +x "${OUTPUT_DIR}/pgo"
 
 
# Check that the pgouser-admin secret exists
if [ -z "$($PGO_CMD get secret -n ${PGO_OPERATOR_NAMESPACE} ${PGO_USER_ADMIN})" ]
then
    echo "${PGO_USER_ADMIN} Secret not found in namespace: ${PGO_OPERATOR_NAMESPACE}"
    echo "Please ensure that the PostgreSQL Operator has been installed."
    echo "Exiting..."
    exit 1
fi
 
# Check that the pgo.tls secret exists
if [ -z "$($PGO_CMD get secret -n ${PGO_OPERATOR_NAMESPACE} pgo.tls)" ]
then
    echo "pgo.tls Secret not found in namespace: ${PGO_OPERATOR_NAMESPACE}"
    echo "Please ensure that the PostgreSQL Operator has been installed."
    echo "Exiting..."
    exit 1
fi
 
# Restrict access to the target file before writing
kubectl_get_private() { touch "$1" && chmod a-rwx,u+rw "$1" && $PGO_CMD get > "$1" "${@:2}"; }
 
# Use the pgouser-admin secret to generate pgouser file
kubectl_get_private "${OUTPUT_DIR}/pgouser" secret -n "${PGO_OPERATOR_NAMESPACE}" "${PGO_USER_ADMIN}" \
	-o 'go-template={{ .data.username | base64decode }}:{{ .data.password | base64decode }}'
 
# Use the pgo.tls secret to generate the client cert files
kubectl_get_private "${OUTPUT_DIR}/client.crt" secret -n "${PGO_OPERATOR_NAMESPACE}" pgo.tls -o 'go-template={{ index .data "tls.crt" | base64decode }}'
kubectl_get_private "${OUTPUT_DIR}/client.key" secret -n "${PGO_OPERATOR_NAMESPACE}" pgo.tls -o 'go-template={{ index .data "tls.key" | base64decode }}'
 
echo "pgo client files have been generated, please add the following to your bashrc"
echo "export PATH=${OUTPUT_DIR}:\$PATH"
echo "export PGOUSER=${OUTPUT_DIR}/pgouser"
echo "export PGO_CA_CERT=${OUTPUT_DIR}/client.crt"
echo "export PGO_CLIENT_CERT=${OUTPUT_DIR}/client.crt"
echo "export PGO_CLIENT_KEY=${OUTPUT_DIR}/client.key"