# RKE2

Docs about RKE2 can be found [here](https://docs.rke2.io/).

## Install RKE2

There are two install types `agent` for pure worker nodes and `server` for management nodes.  
There need to be at least three `server` nodes. there can be any number of `agent` nodes.

It is probably best to install the same version of Kubernetes as the cluster the node will be attached to. If needed the whole cluster can be upgraded after adding the nodes.
```bash
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=v1.xx.yy+rke2zz sudo -E sh -
sudo mkdir -p /etc/rancher/rke2
sudo vi /etc/rancher/rke2/config.yaml
```
Create a `/etc/rancher/rke2/config.yaml` which will look something like:
```yaml
node-name: leap-micro6
node-external-ip:
  - 10.3.6.55
node-ip:
  - 10.6.16.55
advertise-address: 10.6.16.55
# the following token can be found on 10.6.16.61 using
# sudo cat /var/lib/rancher/rke2/server/node-token
# comment this setting for the very first master node
token: K10<...>::server:<...>
# on all other servers after the initial setup on 10.6.16.61 is completed
server: https://10.6.16.61:9345
cluster-cidr: 10.42.0.0/16
service-cidr: 10.43.0.0/16
cni: calico
disable-kube-proxy: false
etcd-expose-metrics: false
etcd-snapshot-retention: 5
etcd-snapshot-schedule-cron: 0 */5 * * *
kube-controller-manager-arg:
  - cert-dir=/var/lib/rancher/rke2/server/tls/kube-controller-manager
  - secure-port=10257
kube-controller-manager-extra-mount:
  - >-
    /var/lib/rancher/rke2/server/tls/kube-controller-manager:/var/lib/rancher/rke2/server/tls/kube-controller-manager
kube-scheduler-arg:
  - cert-dir=/var/lib/rancher/rke2/server/tls/kube-scheduler
  - secure-port=10259
kube-scheduler-extra-mount:
  - >-
    /var/lib/rancher/rke2/server/tls/kube-scheduler:/var/lib/rancher/rke2/server/tls/kube-scheduler
kubelet-arg:
  - max-pods=250
kubelet-extra-mount:
  - >-
    /lib/modules:/lib/modules
node-label:
  - cattle.io/os=linux
protect-kernel-defaults: false
```
Create a `/etc/rancher/rke2/registries.yaml` for using a local docker.io mirrir which will look something like:
```yaml
mirrors:
  docker.io:
    endpoint:
      - "http://10.6.16.58:5000"
```

### Server

```bash
export PATH=$PATH:/opt/rke2/bin
sudo systemctl enable rke2-server.service
sudo systemctl start rke2-server.service
sudo journalctl -u rke2-server -f
```

### Agent

```bash
export PATH=$PATH:/opt/rke2/bin
sudo systemctl enable rke2-agent.service
sudo systemctl start rke2-agent.service
sudo journalctl -u rke2-agent -f
```
## (Re)creating local storage provisioner volumes

As any service using local storage should implement restroing missing data themselves this describes how to create just the empty volumes/disks to do that.

As the local storage provisioner can not change the size of the volumes it will select the next larger volume for any claim. For example a 20 GB claim will select a 29.4 GiB volume, a 30 GB claim a 30.2 GiB volume etc.

For flatcar-linux we can follow the advice on the sig-storage-local-static-provisioner website: Mount formatted block storage on `/mnt/local-disks/<UUID>`. The UUID will be used to make sure mixing block devices will fail and not expose data to the wrong host.

* Create a block device for an acdh-clusterX node in vCenter. Note that the size of the block device should be a little larger than the desired even number if GiB (example: for a 20 GiB volume create a 21 GiB disk) as there is a difference in how disk size is calculated
* Format the volume on the respective flatcar node. Use ext4 or xfs depending on the needs of the service (for example elasticsearch/opensearch recommeds ext4)  
```bash
sudo mkfs.ext4 /dev/sdd
```
* reserved blocks for root are not very useful in kubernetes so set them to 0
```bash
sudo tune2fs -r 0 /dev/disk/by-uuid/<UUID>
```
* Get the UUID. It is part of the output of `mkfs.ext4` above. It is also for example available using using `ls -l /dev/disk/by-uuid/*`
* Create a mount unit to mount the filesystem. The filename needs to match the mount point and is encoded.
  This will automatically create a `<UUID>` directory in `/mnt/local-disks/`
```bash
sudo cp /etc/systemd/system/var-lib-rancher.mount "/etc/systemd/system/$(systemd-escape --path /mnt/local-disks/<UUID>).mount"
sudo vi /etc/systemd/system/"$(systemd-escape --path /mnt/local-disks/<UUID>).mount"
# change directory name and device name
# [Unit]
# Description=Mount local storage at /mnt/local-disks/<UUID>
# Before=local-fs.target
# [Mount]
# What=/dev/disk/by-uuid/<UUID>
# Where=/mnt/local-disks/<UUID>
# Type=ext4 or xfs
# [Install]
# WantedBy=local-fs.target
sudo systemctl daemon-reload
sudo systemctl enable "$(systemd-escape --path /mnt/local-disks/<UUID>).mount"
```

## Updating RKE2

This is best done using the Rancher UI for cluster updates. If the version there and the version on the nodes get out of sync _also all other settings cannot be changed anymore!_.  
But for reference here is the very simple method of following the stable release channel for RKE2:

```bash
curl -sfL https://get.rke2.io | INSTALL_RKE2_CHANNEL=stable sudo -E sh -
sudo systemctl restart rke2-agent
# or
sudo systemctl restart rke2-server
```

Repeat on each node after the last one is showing as up and active in Rancher.
Start with the management/`server` nodes then update the `agent` nodes.

[Here](https://update.rke2.io/v1-release/channels) you can see what version corresponds to stable at the moment.
Kubernetes major versions are also channels. The channel latest refers to the very latest releases of K8s available.

## Troubleshooting

### Using command line tools to manually delete container images images

```bash
sudo -s
# as root
export PATH=$PATH:/var/lib/rancher/rke2/bin
export CONTAINERD_ADDRESS=/run/k3s/containerd/containerd.sock
ctr -n k8s.io i rm $(ctr -n k8s.io i ls -q | grep <image name to delete, regex>)
# or
export CONTAINER_RUNTIME_ENDPOINT=unix:///run/k3s/containerd/containerd.sock
crictl images
crictl rmi <image name to delete>
```