Table of Contents
RKE2
Docs about RKE2 can be found here.
Install RKE2
There are two install types agent for pure worker nodes and server for management nodes. There need to be at least three server nodes. there can be any number of agent nodes.
It is probably best to install the same version of Kubernetes as the cluster the node will be attached to. If needed the whole cluster can be upgraded after adding the nodes.
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=v1.xx.yy+rke2zz sudo -E sh - sudo mkdir -p /etc/rancher/rke2 sudo vi /etc/rancher/rke2/config.yaml
Create a /etc/rancher/rke2/config.yaml which will look something like:
node-name: leap-micro6 node-external-ip: - 10.3.6.55 node-ip: - 10.6.16.55 advertise-address: 10.6.16.55 # the following token can be found on 10.6.16.61 using # sudo cat /var/lib/rancher/rke2/server/node-token # comment this setting for the very first master node token: K10<...>::server:<...> # on all other servers after the initial setup on 10.6.16.61 is completed server: https://10.6.16.61:9345 cluster-cidr: 10.42.0.0/16 service-cidr: 10.43.0.0/16 cni: calico disable-kube-proxy: false etcd-expose-metrics: false etcd-snapshot-retention: 5 etcd-snapshot-schedule-cron: 0 */5 * * * kube-controller-manager-arg: - cert-dir=/var/lib/rancher/rke2/server/tls/kube-controller-manager - secure-port=10257 kube-controller-manager-extra-mount: - >- /var/lib/rancher/rke2/server/tls/kube-controller-manager:/var/lib/rancher/rke2/server/tls/kube-controller-manager kube-scheduler-arg: - cert-dir=/var/lib/rancher/rke2/server/tls/kube-scheduler - secure-port=10259 kube-scheduler-extra-mount: - >- /var/lib/rancher/rke2/server/tls/kube-scheduler:/var/lib/rancher/rke2/server/tls/kube-scheduler kubelet-arg: - max-pods=250 kubelet-extra-mount: - >- /lib/modules:/lib/modules node-label: - cattle.io/os=linux protect-kernel-defaults: false
Create a /etc/rancher/rke2/registries.yaml for using a local docker.io mirrir which will look something like:
mirrors: docker.io: endpoint: - "http://10.6.16.58:5000"
Server
export PATH=$PATH:/opt/rke2/bin sudo systemctl enable rke2-server.service sudo systemctl start rke2-server.service sudo journalctl -u rke2-server -f
Agent
export PATH=$PATH:/opt/rke2/bin sudo systemctl enable rke2-agent.service sudo systemctl start rke2-agent.service sudo journalctl -u rke2-agent -f
(Re)creating local storage provisioner volumes
As any service using local storage should implement restroing missing data themselves this describes how to create just the empty volumes/disks to do that.
As the local storage provisioner can not change the size of the volumes it will select the next larger volume for any claim. For example a 20 GB claim will select a 29.4 GiB volume, a 30 GB claim a 30.2 GiB volume etc.
For flatcar-linux we can follow the advice on the sig-storage-local-static-provisioner website: Mount formatted block storage on /mnt/local-disks/<UUID>. The UUID will be used to make sure mixing block devices will fail and not expose data to the wrong host.
- Create a block device for an acdh-clusterX node in vCenter. Note that the size of the block device should be a little larger than the desired even number if GiB (example: for a 20 GiB volume create a 21 GiB disk) as there is a difference in how disk size is calculated
- Format the volume on the respective flatcar node. Use ext4 or xfs depending on the needs of the service (for example elasticsearch/opensearch recommeds ext4)
sudo mkfs.ext4 /dev/sdd
- reserved blocks for root are not very useful in kubernetes so set them to 0
sudo tune2fs -r 0 /dev/disk/by-uuid/<UUID>
- Get the UUID. It is part of the output of
mkfs.ext4above. It is also for example available using usingls -l /dev/disk/by-uuid/* - Create a mount unit to mount the filesystem. The filename needs to match the mount point and is encoded. This will automatically create a
<UUID>directory in/mnt/local-disks/
sudo cp /etc/systemd/system/var-lib-rancher.mount "/etc/systemd/system/$(systemd-escape --path /mnt/local-disks/<UUID>).mount" sudo vi /etc/systemd/system/"$(systemd-escape --path /mnt/local-disks/<UUID>).mount" # change directory name and device name # [Unit] # Description=Mount local storage at /mnt/local-disks/<UUID> # Before=local-fs.target # [Mount] # What=/dev/disk/by-uuid/<UUID> # Where=/mnt/local-disks/<UUID> # Type=ext4 or xfs # [Install] # WantedBy=local-fs.target sudo systemctl daemon-reload sudo systemctl enable "$(systemd-escape --path /mnt/local-disks/<UUID>).mount"
Updating RKE2
This is best done using the Rancher UI for cluster updates. If the version there and the version on the nodes get out of sync also all other settings cannot be changed anymore!. But for reference here is the very simple method of following the stable release channel for RKE2:
curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL=stable sudo -E sh - sudo systemctl restart rke2-agent # or sudo systemctl restart rke2-server
Repeat on each node after the last one is showing as up and active in Rancher. Start with the management/server nodes then update the agent nodes.
Here you can see what version corresponds to stable at the moment. Kubernetes major versions are also channels. The channel latest refers to the very latest releases of K8s available.
Troubleshooting
Using command line tools to manually delete container images images
sudo -s # as root export PATH=$PATH:/var/lib/rancher/rke2/bin export CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock ctr -n k8s.io i rm $(ctr -n k8s.io i ls -q | grep <image name to delete, regex>) # or export CONTAINER_RUNTIME_ENDPOINT=unix:///run/k3s/containerd/containerd.sock crictl images crictl rmi <image name to delete>