All config and cmd in this blog has been verified and tested against Openshift 4.5 release Openshift 4.5 introduced new way to deploy kubernetes by using Coreos with Igition, this solution makes sure all nodes in a cluster share same image and end-users are not encouraged to modify anything on OS level, everything(nic changes, troubleshoot, ssl injection) should be done through Openshift itself by defining yaml(Machineconfig for OS files, nmstate can mod nic).

Continue reading

Redhat KVM

A simple memo about how to create proper PXE bootable KVM instances on RHEL8. Create Virtual Port Multiple different type of port can be used on KVM instances, you can choose physical interface such as eno1 or bond0, or you can use bridge as its overlayer. Create bond0 based on eno2: nmcli con add type team con-name bond0 ifname bond0 config '{"runner": {"name": "activebackup"}}' nmcli con add type team-slave con-name bond0-eno2 ifname eno2 master bond0 nmcli dev dis eno2 nmcli con up bond0 Create bridge:

Continue reading

Sometimes if vmware vm got shutdown inappropriately, the filesystem may crush or has error on next reboot, and / drive will become read-only and none of software usable at all. To fix this issue, we can simply force / to be remounted and forcely repair disk. For example, we have an Ubuntu server with disk failure, if we check its mounted disk, we’ll see / is read-only: # mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) udev on /dev type devtmpfs (rw,nosuid,relatime,size=1988484k,nr_inodes=497121,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=403992k,mode=755) /dev/mapper/ubuntu--vg-ubuntu--lv on / type ext4 (ro,relatime,data=ordered) and if we check its disk layout:

Continue reading

This method has been tested working on a single master k8s, for multi-master clusters, may not work! Kubernetes requires certs on each nodes/masters to validate each other’s integrity, if the cert ever gets expired, you’d see an error like this: Unable to connect to the server: x509: certificate has expired or is not yet valid.. To fix this cluster, we first need to verify the cert status by: $ openssl x509 -noout -text -in /etc/kubernetes/pki/apiserver.

Continue reading

Redhat Openstack has build-in pacemaker to manage few docker containers status, and it also affects how Mariadb works on Openstack. Usually when you see a Mariadb failure on Redhat Openstack, you would see some thing like this: [[email protected] etc]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller1 (version 1.1.19-8.el7_6.2-c3c624ea3d) - partition with quorum Last updated: Thu May 7 21:55:43 2020 Last change: Thu May 7 21:51:15 2020 by hacluster via crmd on controller2 12 nodes configured 36 resources configured Online: [ controller1 controller2 controller3 ] GuestOnline: [ [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] ] Full list of resources: Docker container set: rabbitmq-bundle [10.

Continue reading

MaaS Notes Installation LXD based maas is so far the best solution. Follow official guide lxc install maas and mass installation. Few steps to install: Create dedicated lxd env for maas, including network and storage pool. maas init to create admin user. Login https://{MAAS}:5240/MAAS Setup user public key injection for bare metal commissioning. Commision nodes and setup networks. Deploy. Storage Preparation Volume can be ZFS/LVM/btrfs: Create lvm pool. e.g /dev/lxc-vg/maas apt install thin-provisioning-tools to install lvmthiner driver.

Continue reading

OVS traffic capture OVS traffic flow illustration(kolla example): traffic to go out of cloud via provider network VM –> tap+qbr(linuxbridge)+qvb –> qvo+br-int+int-br-ex –> phy-br-ex+br-ex+br_vlan –> external network traffic to go to vxlan tenant VM –> tap+qbr(linuxbridge)+qvb –> qvo+br-int+patch-tun –> patch-int+br-tun+port vxlan# –> remote host vxlan if ip if no DVR used, then all traffic will go to neutron nodes from compute nodes then use neutron nodes’ port# to go out.

Continue reading

To enable onboard Horizontal Autoscaling feature, a Metric Server needs to be installed first for k8s to pull resource data from. helm install stable/metrics-server -n metric --namespace kube-system -f metric.yml Metric Server has a chart on Helm stable, but somehow new version of it behaves weirdly, it shows error as: unable to fetch pod metrics for pod rook-ceph/csi-rbdplugin-qv94k: no metrics known for pod When this happens, it means you are facing some TLS and network issues.

Continue reading

How to - Ceph - Identify the server drive bay number of a faulty drive To identify a faulty disk is in which drive bay: Method 1 - Using iLO and iDRAC Login to the iLO or iDRAC interface Check for error messages in the iLO or iDRAC. If iLO (HP), from the main page, go to Information → System Information → Storage → Physical View if iDRAC (Dell), Method 2 - Using CLI SSH to the server (storage node, compute node, controller, etc.

Continue reading

How To Replace Ceph Osd

How to - Ceph - Configure Ceph on a new drive source: https://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/ Remove the OSD of the faulty drive If you are replacing a faulty drive with a new one, you will need to remove the OSD of the faulty drive before proceeding with creating the new OSD. *Requirement: The faulty SSD must have been replaced with a healthy SSD. Login to the Ceph node with the faulty drive.

Continue reading

Author's picture

Charles

Love coding and new technologies

Cloud Solution Consultant

Canada