A Cluster will be running if you just go through the Practice as this Markdown document introduces. Its Major Features include:
(1), It will be a High-Available (HA) cluster, based on Microk8s, which app can continuously run at even if some Hosts in the cluster may get failed.
(2), The cluster is cost-efficient to deploy, even if setup it at home.
(3), It constructs a Distributed File System (DFS) by setting up a Storage Cluster running on Ceph + Rook on Microk8s. It aims to reduce cost but not loose High-Availability for the Storage Cluster.
(4), The app that we developed to verify the DFS working is a Full-Stack web app. Its web server app in the cluster can be visited by its browser app from both LAN and Internet (as a Cloud).
The Cluster/Cloud-based app implements CRUD (create, read, update, delete) operations on files that browser requests to server. It is coded with Express.js (Node.js). According to RESTful API requirement, its browser app just requests server app to Transfer one Representational State to another: list/upload/download/delete files to/from/inside DFS of server.
(5) All the resources to create, run, maintain the Bare Metal (all the nodes to install cluster are physical hosts other than virtual machines) and On-premises (you provide all running environment and resources other than renting them from other service providers) Cluster/Cloud will be all at your hand. That is, at least, your won't have "excuse" to say that our operational system is impacted by outside world :-)
1.1, Prepare hosts
1.2, Install Microk8s
1.4, Enable addon "DNS"
1.7, Set up HA cluster
5.1, Deploy Tools
5.1.1. Deploy Rook Toolbox
5.1.2. Deploy Ceph Dashboard
5.2.1, Verify normal working status
6, To-dos
7, Summary
Microk8s is a "minimal production Kubernetes". Kubernetes is a "container orchestration system for automating software deployment, scaling and management".
According to technical theory and our practice, Micork8s plays a critical role as base of the whole system as:
(1), It organizes system and app components as self-cooperative containers. The mechanism isolates the cluster from Host system (hardwar/software) that it resides, so that it is much easier for us to develope and maintain our cluster on each host. For example, Linux kernel, version 5.4.0, is installed into the pod, which is just like a "micro-computer" that our test app "hello" is running at. If needed, we could upgrade or downgrade its version or change its configuration but will do nothing with the Host environment.
(2), The app deployed on Microk8s as a Kubernetes service, such as web app, can be visited from both LAN (as cluster) and Internet (as Cloud). That is, Microk8s establishes fundamental footstone, based on which you shall be able to implement a customized one on either local (enterpriser, building, house, and etc) or public (remote customers) scale.
I prepared 4 PCs to construct the cluster. These are the Hosts that I just found at my storage (is it "cost-efficient"? :-)
Their names are like below:
xiaozhong-x570,
xiaozhong-w540,
xiaozhong-giga,
xiaozhong-ThinkPad-T400.
You may use more Hosts depending on your own case.
To access the cluster, another device may be needed to run a browser app, which is out of the cluster. On my trial, that is 1 more laptop or raspberry Pi 3B+ board which browser runs on.
On every Host, download our repository from github:
$ git clone https://github.com/huaxiaozhong1/on-premises-high-available-kubernetes-distributed-storage-cluster.git
$ cd on-premises-high-available-kubernetes-distributed-storage-cluster.git
Start to install Microk8s and necessary add-ons.
$ sudo snap install microk8s --channel=1.22/stable --classic
The installation may get failed due to issue with network connection. On the case, you could download Microk8s packages at first.
$ sudo snap download microk8s --channel=1.22/stable
The version of microk8s running on my cluster is v1.22.9-3+b6876a8b1b090b. You could check it after Microk8s works.
$ microk8s kubectl version
The reversion that snap identifies is 3203. You could check it after Microk8s is installed.
$ sudo snap list microk8s
So, if your downloading is successful, you could find 2 more files are added in the folder: microk8s_3203.assert and microk8s_3203.snap. Run the following commands to install:
$ sudo snap ack microk8s_3203.assert
$ sudo snap install microk8s_3203.snap --classic
Notice:
(1) You may need to config your firewall to allow communication on cni0 (Container Network Interface as a bridge for all pods in a node):
$ sudo ufw allow in on cni0 && sudo ufw allow out on cni0 && sudo ufw default allow routed
(2) On worse case, the downloading may get stuck on installation. Then you could contact to me to get the 2 files from me :-) The snap package is too big to upload to my git repository.
Now, please check status until "microk8s is running".
$ microk8s status
microk8s is running
high-availability: no
datastore master nodes: 192.168.0.100:19001
datastore standby nodes: none
addons:
enabled:
disabled:
ambassador
...
Run the following command, to check if all elements have become "ready". Especially, the sign of "READY" of each pod turns to be "1/1".
$ microk8s kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-node-q94vg 1/1 Running 0 3m
...
During your installing either Microk8s Cluster or addons, it may be short of some dependent packages so that the cluster won't work. At the moment, you need to search those packages. There are some ways or tricks to look for them over Internet.
For example, it's probably that some pods are not ready after a long time, as below:
$ microk8s kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-node-q94vg 0/1 Running 0 3m
...
Then you need to run the following command to check "why":
$ microk8s kubectl describe pod/calico-node-q94vg -n kube-system
Name: calico-node-q94vg
Namespace: kube-system
...
... failed to pull image "k8s.gcr.io/pause:3.1": ...
...
Running the following command can know where to get the docker package.
$ sudo docker search pause
NAME DESCRIPTION STARS OFFICIAL AUTOMATED
mirrorgooglecontainers/pause-amd64 19
...
Pull package "pause:3.1":
$ sudo docker pull mirrorgooglecontainers/pause-amd64:3.1
$ sudo docker tag mirrorgooglecontainers/pause-amd64:3.1 k8s.gcr.io/pause:3.1
After getting the docker image, let's go through a General Procedure to transfer it to Microk8s.
Save the docker image as a tarball.
$ sudo docker save k8s.gcr.io/pause:3.1 > pause.3.1.tar
Then, import the tarball into microk8s:
$ microk8s ctr image import pause.3.1.tar
Now, the READY of pod "calico-node-q94vg" should become "1/1" if you run "checking" as below:
$ microk8s kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-node-q94vg 1/1 Running 0 3m
...
If the pod still can't turn to "ready", stop/start microk8s as:
$ microk8s stop
$ microk8s start
Eventually it should be READY as "1/1".
Now we have launched a fundamental framework of Microk8s cluster. Let's install some necessary addons at step 1.4 - 1.6.
Notice: if you aren't able to find any way to get such necessary image from Internet, you could contact to me. I can share it with you :-)
The addon dns is commonly required by other addons.
$ microk8s enable dns
...
$ microk8s status
...
$ microk8s kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
...
kube-system pod/coredns-86f78bb79c-p554n 1/1 Running 0 1m
...
Checking all elements' status as we did at step 1.2. When all elements reach READY as "1/1", go to step 1.5.
The addon will enable traffic to come into your Kubernetes cluster from outside.
$ microk8s enable ingress
As we did at step 1.2, check if the sign READY of the addon gets "1/1".
$ microk8s status
...
$ microk8s kubectl get all --all-namespaces
...
Kubernetes doesn't offer an implementation of network load balances(Services of type LoadBalancer). The implementation of network load balancers that Kubernetes ships with are all glue code that calls out to various IaaS (Infrastruture as a Service) platforms (GCP, AWS, Azure...). If you're not running on such a supported IaaS platform, LoadBalancer will remain in the "pending" state indefinitely when created!
It is MetalLB that offers such a network load balancer implementation to integrate with standard network equipment, so that "External Service" on the bare-metal cluster can work in a similar way to what was supported by GCP, AWS, Azure...
As result, when you launch an app at the cluster, it could be open to external world as "an IP address and an accessible port". The address/port doesn't match to any pair on a physical host; they are address and port assigned and managed by the Microk8s Cluster that you have just created.
Now checking your physical router, to know the IP-address-pool that was set for machines and devices to connect. On my case, I set the pool ranging from 192.168.0.100 to 192.168.0.199. So, the 4 hosts connected to the router have IPs as: 192.168.0.100 - 192.168.0.103.
Then, I would assign MetalLB an IP-address-pool as 192.168.0.120 - 192.168.0.127. So, the following command is typed:
$ microk8s enable metallb:192.168.0.120-192.168.0.127
Go through the similar way, which we went through previous steps, to check and guarantee the addon works.
Select a node as master node. It is 192.168.0.100, on my case. Run the following command at the host:
$ microk8s add-node
From the node you wish to join to this cluster, run the following:
microk8s join 192.168.0.100:25000/0e15febd53956674e8962a5240f08c3d
...
At all other hosts, apply to join the cluster, run:
$ microk8s join 192.168.0.100:25000/0e15febd53956674e8962a5240f08c3d
...
Checking status by running after add all 4 nodes into the cluster:
$ microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.0.100:19001 192.168.0.101:19001 192.168.0.102:19001
datastore standby nodes: 192.168.0.103:19001
...
It shows that there have been 3 "datastore master nodes" in this HA cluster. And there is 1 more node as "datastore standby node".
As mentioned at steps above, you still need to check every elements' states so that we could know if all components in the cluster work normally.
$ microk8s kubectl get all --all-namespaces
...
At the time, you can also check all nodes' status by run:
$ microk8s kubectl get no
NAME STATUS ROLES AGE VERSION
xiaozhong-x570 Ready <none> 1d v1.22.9-3+b6876a8b1b090b
xiaozhong-w540 Ready <none> 1d v1.22.9-3+b6876a8b1b090b
xiaozhong-giga Ready <none> 1d v1.22.9-3+b6876a8b1b090b
xiaozhong-ThinkPad-T400 Ready <none> 1d v1.22.9-3+b6876a8b1b090b
On my case, the 4 hosts are all READY, at the end.
As soon as the "Multiple Nodes" cluster is setup, how could we are sure if "High Availability" functionality takes effect on it?
-- We have developed a simple "test" app for the purpose.
On my Ubuntu 20.04 machines, docker.io is installed. Let's code a docker container app, called as "hello", to verify that the Microk8s Cluster is working.
Run the following commands at every node, to create the app for each host. Or, you could build it in one host, scp it to other hosts.
$ cd on-premises-high-available-kubernetes-distributed-storage-cluster.git/hello
$ sudo docker build -t hello:local .
...
A local docker image, which is named as "hello:local", is created.
Save the image as a tarball:
$ sudo docker save hello:local > ../hello.local.tar
The procedure may need a few minutes.
We have built the tarball of test app "hello" at step 2.1. Now let's deploy it.
$ microk8s ctr image import ../hello.local.tar
...
$ microk8s kubectl create deployment hello --image=hello:local
...
$ microk8s kubectl scale deployment hello --replicas=4
...
$ microk8s kubectl expose deployment hello --port=8081 --target-port=8080 --type=LoadBalancer --name=hello
...
After the 3 commands get done, check the status of service expose as above:
microk8s kubectl get all --all-namespaces
...
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 11d
default service/hello LoadBalancer 10.152.183.185 192.168.0.120 8081:31305/TCP 1m
...
Now we know that the Kubernetes Service "hello" could be visited via external IP (192.168.0.120 on port 8081) from devices out of the cluster.
Let's go to a device. On the practice, the device is connected to the same router that cluster's all hosts are connected to, but doesn't join the Microk8s Cluster. Run the following command at Command-Line terminal (CLI) of the device:
$ curl http://192.168.0.120:8081
Hello World
The hello app shall reply terminal as: "Hello World". You could define other response by modifying the code at hello/server.js:
res.send('Hello World \n');
Let's power off 192.168.0.100, the current "master node (host: xiaozhong-x570)" we selected while constructing the cluster.
Then, check cluster status again from any other host that is still working, after xiaozhong-x570 is shutdown:
$ microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.0.101:19001 192.168.0.102:19001 192.168.0.103:19001
datastore standby nodes: none
...
Now we witness that the Microk8s Cluster is working continuously even if the original "master" node is shutdown! This is the functionality that High Availability acts as.
Try to access app "hello", from the external device, via 192.168.0.120 (with port 8081) again:
$ curl http://192.168.0.120:8081
Hello World
It is verified that the app's IP, exposed by cluster, does continuously work. It is not influenced with the original "master" node stops working. Actually this is a behavior that Load Balancer brings in. It distributes Load (app hello, on the case) over all the remaining nodes. So, on the customers' feeling, the app keeps working when some node fails.
As for the remained 3 nodes, let's turn off 1 more node. That is, only 2 nodes are working. At the moment, run command below again:
$ curl http://192.168.0.120:8081
No any response will come.
Now we are sure that the URL stops the response to browser request any more. So we are verified that HA feature will be lost when the number of working nodes inside the cluster get less than 3.
Power on a node, it will be back to work automatically. The status shall be as below after 3 nodes work in the cluster:
$ microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.0.101:19001 192.168.0.102:19001 192.168.0.103:19001
datastore standby nodes: none
...
Then, run browser, the response shall come as below:
$ curl http://192.168.0.120:8081
Hello World
The response comes back from our web server app side.
Now we know that the URL comes back to us when the Cluster is back with 3 nodes working.
Turn on the 4th host, as well, and wait the following checking to show all pods running again.
$ microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.0.101:19001 192.168.0.102:19001 192.168.0.103:19001
datastore standby nodes: 192.168.0.100:19001
...
$ microk8s kubectl get all --all-namespaces
...
Now we know how to get a Microk8s (Kubernetes) Cluster working Without Pausing. An app can be containerized to run on the platform and with support by multiple hosts to achieve High Available functionalities, such as, Continued Service and Load Balancing. For example, the External Service with type LoadBalance will continuously work when some node fails, only if the remaining nodes in cluster is not less than a certain number.
But a Kubernetes cluster (e.g., Microk8s one) doesn't bring us HA Storage System naturally. For example:
if you plan to deploy an app that is capable for CRUDing files in the Cluster other than at a specific host;
if you hope that the system keeps such CRUD operations working without pausing even if some hosts get failed;
-- much more construction needs to be done in the system.
-- As a solution, you need to install a full Distributed File System (DFS) in the Microk8s Cluster.
The DFS I am going to implement is based on Ceph + Rook.
Ceph is a software-defined storage platform that provides interfaces for object-, block- and file-level storage. In this system, we will utilize one of the interfaces -- DFS.
The storage platform (or Storage Cluster) doesn't exist if only a Kubernetes Cluster is installed. In this system, we choose Rook that eventually provisions our app a set of apis to work with DFS. For example, Rook inokes a standard Kubernetes' layer to interact with Ceph's DFS, which is Persistent Volume Classification (PVC). Then, the Volume can be mounted as a "folder" in normal file system inside a pod, so that an app could upload a file to and download it from the folder.
There is an interesting and useful fact: our app, via DFS, just puts files into and gets them from a pod's folder; but it is no need for app to take care where and how the DFS exists and works.
From next sections, let's introduce the whole procedure to setup such a Storage Cluster or DFS.
Go to upper level directory that is at the same level of "on-premises-high-available-kubernetes-distributed-storage-cluster.git", to download git repository for Rook,
$ git clone --single-branch --branch v1.9.6 https://github.com/rook/rook.git
$ cd rook/deploy/examples
In directory "rook/deploy/examples", execute 2 YAML (Yet Another Markup Language) files to create related Customer Resource Definition (CRD) and Common Resource, which are necessary parts for a typical Kubernetes Cluster.
$ microk8s kubectl create -f crds.yaml
$ microk8s kubectl create -f common.yaml
Consequently, change 2 line in operator.yaml, then run it. The yaml is to add all necessary operators to setup Storage Cluster in Kubernetes Cluster.
ROOK_CSI_KUBELET_DIR_PATH: "/var/snap/microk8s/common/var/lib/kubelet"
...
# Whether to start the discovery daemon to watch for raw storage devices on nodes in the cluster.
ROOK_ENABLE_DISCOVERY_DAEMON: "true"
Here ROOK_CSI_KUBELET_DIR_PATH is directory path for kubelet. It is /var/snap/microk8s/common/var/lib/kubelet if Microk8s has been installed.
The ROOK_ENABLE_DISCOVERY_DAEMON is a configuration setting for Rook. If it is set as true, the execution of operator.yaml will create Discovery Daemon (a pod) one per one node (Host). One mission of these pods is to find Raw Storage Device one per host, which the OSD (Object Storage Daemon) created by execution of cluster.yaml needs to consume data on.
Here is the means to prepare Raw Storage Device.
Raw Storage Device requires an individual disk partition on each node. You could check if an empty partition exists
$ lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
vda
└─vda1 LVM2_member >eSO50t-GkUV-YKTH-WsGq-hNJY-eKNf-3i07IB
├─ubuntu--vg-root ext4 c2366f76-6e21-4f10-a8f3-6776212e2fe4 /
└─ubuntu--vg-swap_1 swap 9492a3dc-ad75-47cd-9596-678e8cf17ff9 [SWAP]
vdb
If, the attribute FSTYPE is empty with a partition, it means the partition can be used as RAW Storage Device at next section.
If, at a host, there is no Empty disk partition (for example, the whole disk space has been occupied by some file systems), then you have to find a way to get a little space for the Raw Storage Device. The way could be to add 1 more physical hard disk, or shrink the current disk partitions by using some tools, such as resize2fs.
If there is a partition that is not empty but can be cleaned up (e.g., its FSTYPE has been set as ext4 or some others), then you could find disk-clean.sh from my repository, to clean it up by erasing all data in the partition.
You could find a similar line in this shell script:
DISK="/dev/vdb"
Re-assign the DISK as a partition name, the partition that you plan to create our DFS on.
Notice: Be careful! There is no way to restore data if the data is erased in this way. So, you need to be 100% sure all the data in the partition is no use before you execute the script.
Then:
sudo sh disk-clean.sh
...
After you succeed in creating the Raw Storage Devices in every node, go to --
It's time to run operator.yaml that we prepared on Section 3.2:
$ microk8s kubectl create -f operator.yaml
configmap/rook-ceph-operator-config created
deployment.apps/rook-ceph-operator created
$ microk8s kubectl get all -n rook-ceph
NAME READY STATUS RESTARTS AGE
pod/rook-ceph-operator-757546f8c7-6jl8q 1/1 Running 0 54s
pod/rook-discover-4n6np 1/1 Running 0 39s
pod/rook-discover-f7xr4 1/1 Running 0 39s
pod/rook-discover-7n8ss 1/1 Running 0 39s
pod/rook-discover-8sf7q 1/1 Running 0 39s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/rook-discover 4 4 4 4 4 <none> 40s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/rook-ceph-operator 1/1 1 1 54s
NAME DESIRED CURRENT READY AGE
replicaset.apps/rook-ceph-operator-757546f8c7 1 1 1 54s
Now we could find: Namespace "rook-ceph" is created. And its first members include 4 Discovery Pods, which will find each Empty Partition we have prepared for the finding in Section 3.3.
Now it's the time to create Storage Cluster, by running cluster.yaml.
After the manifest file is executed, 4 OSD pods will work. These Ceph OSD daemons store most of Ceph's data. Usually each OSD works with a single Storage Device, which Discovery Pod has found during execution of operator.yaml, on our case.
During its launching, OSD daemon will call a tool ceph-volume. The command will determine a Strategy (or called as "Storage Backend" that is Bluestore, on our case) on how to consume data on the Storage Device.
You could run the following command again after OSDs are created:
$ lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
vda
└─vda1 LVM2_member >eSO50t-GkUV-YKTH-WsGq-hNJY-eKNf-3i07IB
├─ubuntu--vg-root ext4 c2366f76-6e21-4f10-a8f3-6776212e2fe4 /
└─ubuntu--vg-swap_1 swap 9492a3dc-ad75-47cd-9596-678e8cf17ff9 [SWAP]
vdb ceph_bluestore
We are told that the vdb has been Formatted the Filesystem Type as ceph_bluestore, which OSD daemon accepts and would consume data on.
Along with OSD, 2 other type of Ceph daemons: Ceph-MON daemon (monitoring status of Ceph distributed file system) and Ceph-MGR (providing interfaces to management and additional/external management for Cluster) are worthy of notice.
mon:
...
count: 3
...
mgr:
...
count: 2
Run:
$ microk8s kubectl create -f cluster.yaml
cephcluster.ceph.rook.io/rook-ceph created
...
The 2 values needn't to be modified if there are 4 nodes to organize our cluster :-)
The procedure needs a while to have everything done. You could keep checking:
$ microk8s kubectl -n ceph-rook get pod
...
rook-ceph-osd-1-5fdcb9fd-8th5r 1/1 Running 2 (3h35m ago) 7d9h
rook-ceph-osd-0-79c64598c8-csxcv 1/1 Running 6 (3h35m ago) 9d
rook-ceph-osd-3-59797b77c9-5hzld 1/1 Running 3 (3h33m ago) 7d23h
rook-ceph-osd-2-6d75df4b44-2pqcx 1/1 Running 5 (5d10h ago) 9d
rook-ceph-mon-f-69668bdf7f-rlnz5 1/1 Running 3 (3h52m ago) 7d10h
rook-ceph-mon-g-5c75bcb898-mx4ms 1/1 Running 4 (3h54m ago) 7d23h
rook-ceph-mon-e-fcb6f98bd-llfbj 1/1 Running 5 (5d11h ago) 8d
rook-ceph-mgr-b-8655f8d98f-tlcx6 2/2 Running 5 (3h55m ago) 7d10h
rook-ceph-mgr-a-7849b47fd8-w9m6v 2/2 Running 11 (5d11h ago) 9d
...
until all pods of namespace ceph-rook enter the status running as above.
At the time, you could find that 3 kind of critical pods (daemons) are working as we set.
4 OSD pods are working, 1 of which works on 1 node. So there are 4 OSDs totally running since our Storage Cluster is comprised of 4 nodes (hosts) totally :-)
Also, you could find that 3 MON (Monitor) pods and 2 MGR (Manager) pods are created, which we set in cluster.yaml.
We will verify whether and how these pods work to construct a HA Storage Cluster in the sections later.
On the OSDs created, we could create DFS by calling:
$ microk8s kubectl create -f filesystem.yaml
cephfilesystem.ceph.rook.io/myfs created
...
Now, there are a few new podes created:
$ microk8s kubectl get pod -n rook-ceph
...
csi-cephfsplugin-b8pgr 3/3 Running 18 (7h7m ago) 9d
csi-cephfsplugin-kbrkl 3/3 Running 21 (7h7m ago) 9d
csi-cephfsplugin-2vk4g 3/3 Running 21 (7h8m ago) 9d
csi-cephfsplugin-f47n2 3/3 Running 18 (7h7m ago) 9d
rook-ceph-mds-myfs-a-6db97f6fc9-4sbwm 1/1 Running 7 (7h6m ago) 9d
rook-ceph-mds-myfs-b-54554c7574-84c7h 1/1 Running 7 (7h3m ago) 8d
...
Related to each OSD pod, a CSI (Container Storage Interface) pod (called as csi-cephfsplugin-xxx) is created. They provide Kubernetes stands interface to call DFS from each orchestrated container.
Here the name of DFS is "myfs", as we define in filesystem.yaml:
kind: CephFilesystem
metadata:
name: myfs
Another 2 more pods are called as MDS (Metadata Server daemon for Ceph DFS). These are 1 active pod and 1 standby pod, as filesystem.yaml sets:
metadataServer:
...
activeCount: 1
...
activeStandby: true
On coming sections, we will know how these critical pods work together, to provide REAL storage usage.
Right after creating DFS, you could verify by mounting it directly:
$ microk8s kubectl create -f direct-mount.yaml
deployment.apps/rook-direct-mount created
$ microk8s kubectl -n rook-ceph get deploy | grep direct
rook-direct-mount 1/1 1 1 1h
$ microk8s kubectl -n rook-ceph exec -it deploy/rook-direct-mount -- bash
[root@xiaozhong-giga /]#
Now, we login to the deployment (it is an app actually) "rook-direct-mount" as "root account", to mount "myfs":
# mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
# my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')
# mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /mnt
# ls -R /mnt
/mnt:
Notice: the command mount has an -o (--option) parameter as mds_namespace=myfs, meaning: the operation is going to mount DFS "myfs".
The results are: 1), the DFS, myfs, has been mounted successfully; 2), an empty folder is mounted. So it's clear that more steps are required so that users' data can have places to be in/out :-)
Until now, we have created a Storage Cluster called as rook-ceph. A DFS (called as myfs), can be mounted and visited by a user account as root from a pod inside the cluster.
However the way we are not able to CRUD files in the DFS from app (e,g from browser) out of the cluster.
To do so, thanks for Kubernetes to provision a standard mechanism, follow which an app could use the DFS as a folder of normal file system.
Let's go.
Create Storage Class (a Kubernetes convention to define "a class" of storage) by calling:
$ microk8s kubectl create -f csi/cephfs/storageclass.yaml
storageclass.storage.k8s.io/rook-cephfs created
Pay attention at storageclass.yaml:
...
kind: StorageClass
metadata:
name: rook-cephfs
...
# CephFS filesystem name into which the volume shall be created
fsName: myfs
...
The command creates Storage Class, called as "rook-cephfs", which could mount myfs as a volume.
In Kubernetes mechanism to manage storage, Persistent Volume (PV) represents a piece of storage that has been provisioned using Storage Classes. A PersistentVolumeClaim (PVC) is a request for storage, through which the volume of storage can be mounted at application level.
Now if we go back to app "rook-direct-mount", the folder /mnt will have some changes:
# ls -Rl /mnt
/mnt:
total 0
drwxr-xr-x 4 root root 2 Jun 29 11:03 volumes
/mnt/volumes:
total 0
drwx------ 2 root root 0 Jun 29 11:03 _deleting
drwxr-xr-x 2 root root 0 Jun 29 11:03 csi
/mnt/volumes/_deleting:
total 0
/mnt/volumes/csi:
total 0
Clearly the directory number has grown up to 4. But the real folder hasn't come to board, which app could consume data on.
Right after executing storage.yaml, set accessModes in pvc.yaml:
...
spec:
accessModes:
- ReadWriteMany
...
storageClassName: rook-cephfs
Running the manifest file:
$ microk8s kubectl create -f pvc.yaml
persistentvolumeclaim/cephfs-pvc created
A PVC, whose name is cephfs-pvc on our case, is created. It matches StorageClass "rook-cephfs", so app "direct-mount.yaml" has mounted it.
Checking /mnt in the app again:
# ls -Rl /mnt
/mnt:
total 0
drwxr-xr-x 4 root root 2 Jun 29 11:03 volumes
/mnt/volumes:
total 0
drwx------ 2 root root 0 Jun 29 11:03 _deleting
drwxr-xr-x 2 root root 0 Jun 29 11:03 csi
/mnt/volumes/_deleting:
total 0
/mnt/volumes/csi:
total 0
[root@xiaozhong-x570 /]# ls -Rl /mnt
/mnt:
total 0
drwxr-xr-x 4 root root 3 Jun 29 11:19 volumes
/mnt/volumes:
total 0
-rwxr-xr-x 1 root root 0 Jun 29 11:19 _csi:csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7.meta
drwx------ 2 root root 0 Jun 29 11:03 _deleting
drwxr-xr-x 3 root root 1 Jun 29 11:19 csi
/mnt/volumes/_deleting:
total 0
/mnt/volumes/csi:
total 0
drwxrwxrwx 3 root root 2 Jun 29 11:19 csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7
/mnt/volumes/csi/csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7:
total 0
drwxrwxrwx 2 root root 0 Jun 29 11:19 042607ab-28f2-42d1-9073-5550b521f662
/mnt/volumes/csi/csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7/042607ab-28f2-42d1-9073-5550b521f662:
total 0
There more folders with 5 level emerging. On the leaf, that is where the DFS would store data from users.
So far, thanks for the series of Kubernetes Standard Interfaces: Storage Class -> PV -> PVC -> DFS, a normal app is able to consume data in Ceph DFS.
On Section 3.6 and 3.7, we tested DFS by logging into app "rook-direct-mount". That isn't a normal way. At this section, the DFS will present at remote web server, and any browser app can CRUD files to/at/from the web server. Only if we have found such as a way, we could say that any customer is able to consume data at the DFS.
Tha app developed by us is called "files".
The framework of app "files" is based on express.js, de facto standard framework for Node.js. And the latter is a full-stack framework for web app developement. We could think it as creating 2 applications once: web server app and web browser app.
Go back to directory "on-premises-high-available-kubernetes-distributed-storage-cluster.git", then enter a sub-folder:
$ cd express-js-files
Under the folder, we could develop and deploy files, which is to CRUD files in DFS.
The app, developed with the codes inside directory, generates a docker container image, which Microk8s can import. The image doesn't only include its own application executable, but also packs up the dependent environment and components, such as, node.js, express.js, along with modules that node.js requires, like multer for file operation, and etc.
All the software stuffs along with building procedure create a set of components according to Microk8s (Kubernetes) framework, such as, pods, deployments, services. Let's notice lines in files-deploy.yaml:
spec:
containers:
...
volumeMounts:
- name: hello-persistent-storage
mountPath: /usr/src/app/uploads
volumes:
- name: hello-persistent-storage
persistentVolumeClaim:
claimName: cephfs-pvc```
Here the yaml manifests a loose coupling that mounts cephfs-pvc (claimName of PVC) to /usr/src/app/uploads (a directory in the pod to CRUD files at). It provides a really simplified means to open an interface between DFS and directory inside pod.
It will be more interesting to look into where, in app "files", the real data consumption execises. See the lines at express-js-files/src/controller/file.controller.js:
// upload a file.
var storage = multer.diskStorage({
destination: function (req, file, callback) {
const directoryPath = __basedir + "/uploads/";
callback(null, directoryPath);
},
...
}
});
It is the I/O interface, through which the uploaded file is stored into directoyrPath. The interface is designed as:
(1), The app only calls one API (multer.diskStorage) to complete operation on file uploading. It doesn't need to know the processing mechanism on Storage side, such as, where and how the Storage Process to run; even what the Storage Medium is (a HostPath disk, a iSCSI device, a DFS cluster, or a cloud bucket of GCP, S3, or Azure...).
(2), The app behavior won't be impacted if the Storage Medium changes, for example, changing from a local DFS cluster to a GCP bucket.
(3), After multer.diskStorage is called, the app just waits for the callback function's coming asynchronously, then execute the next statement.
Obviously, the Achitecture connecting the app and DFS is a Loose Coupling, which is pretty much like a pair of microservices although the DFS doesn't belong the design of our app.
At next section, let's create and take a trial of the app.
At the directory: express-js-files, build docker container image for the app:
$ sudo docker build -t files:local .
$ sudo docker save files:local > ../files.local.tar
Import the image into Microk8s.
$ microk8s ctr image import ../files.local.tar
Deploy the Express.js (Node.js) app from Docker container image to a service of Microk8s (Kubernetes) cluster.
$ microk8s kubectl create -f files-deploy.yaml
$ microk8s kubectl create -f files-service.yaml
Check the results.
$ microk8s kubectl get all | grep files
pod/files-deployment-85d8b99b9-5xccq 1/1 Running 16 19d
pod/files-deployment-85d8b99b9-mjnqx 1/1 Running 16 (82m ago) 19d
pod/files-deployment-85d8b99b9-67tcw 1/1 Running 14 (83m ago) 15d
pod/files-deployment-85d8b99b9-qbhfx 1/1 Running 16 (82m ago) 19d
service/files LoadBalancer 10.152.183.172 192.168.0.121 18080:30529/TCP 19d
deployment.apps/files-deployment 4/4 4 4 19d
replicaset.apps/files-deployment-85d8b99b9 4 4 4 19d
Knowing from the messages, the URL
http://192.168.0.121:18080/
is a public IP address and port that user could call from outside to access the app. The address/port isn't at a specific pod or node or host, it is a "Virtual".
Following the address and port, some parameters can also be input. Server app will follow the parameter to run a specfic operation, as Express.js router defines,
On our app, operations following these parameters are coded:
(1), http://192.168.0.121:18080/upload: upload local a file to server.
When the request is sent to server, such a page will be displayed on browser (Firefox):
Pressing "Browser" button; on the coming pop-up dialog box, select
daisy.jpg from directory on-premises-high-available-kubernetes-distributed-storage-cluster.git/pictures,
upload the local file to server by pressing "Upload File" button.
As soon as server receives/stores the file, a message will display on your browser:
"File is uploaded"
(2), Is the file really uploaded to DFS?
Let's verify via app ("rook-direct-mount") that we launched at Section 3.6, to see if the local file (daisy.jpg) has been uploaded to DFS.
# ls -Rl /mnt
/mnt:
total 0
drwxr-xr-x 4 root root 3 Jun 29 11:19 volumes
/mnt/volumes:
total 0
-rwxr-xr-x 1 root root 0 Jun 29 11:19 _csi:csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7.meta
drwx------ 2 root root 0 Jun 29 11:03 _deleting
drwxr-xr-x 3 root root 1 Jun 29 11:19 csi
/mnt/volumes/_deleting:
total 0
/mnt/volumes/csi:
total 0
drwxrwxrwx 3 root root 2 Jun 29 11:19 csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7
/mnt/volumes/csi/csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7:
total 0
drwxrwxrwx 2 root root 1 Jun 30 11:17 042607ab-28f2-42d1-9073-5550b521f662
/mnt/volumes/csi/csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7/042607ab-28f2-42d1-9073-5550b521f662:
total 280
-rw-r--r-- 1 root root 286328 Jun 30 11:17 kubernetes.png
[root@xiaozhong-x570 /]# ls -Rl /mnt
/mnt:
total 0
drwxr-xr-x 4 root root 3 Jun 29 11:19 volumes
/mnt/volumes:
total 0
-rwxr-xr-x 1 root root 0 Jun 29 11:19 _csi:csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7.meta
drwx------ 2 root root 0 Jun 29 11:03 _deleting
drwxr-xr-x 3 root root 1 Jun 29 11:19 csi
/mnt/volumes/_deleting:
total 0
/mnt/volumes/csi:
total 0
drwxrwxrwx 3 root root 2 Jun 29 11:19 csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7
/mnt/volumes/csi/csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7:
total 0
drwxrwxrwx 2 root root 1 Jun 30 11:32 042607ab-28f2-42d1-9073-5550b521f662
/mnt/volumes/csi/csi-vol-5f923333-f79d-11ec-98ef-92b27d010ee7/042607ab-28f2-42d1-9073-5550b521f662:
total 23
-rw-r--r-- 1 root root 22782 Jun 30 11:32 daisy.jpg
-- the last line shows that the uploaded file is stored in the DFS really, which is at the leaf under /mnt we mounted ahead.
(3), http://192.168.0.121:18080/files: list and download files that exist in server.
If you are using Firefox browser, such a page shall display when the above request is sent to server:
Move mouse onto a url, press right button, select Save link as on the coming pop-up dialog box, select a local directory to download the file from server side.
If you are using Chrome browser (on PC, for example) or Chromium browser (on Raspberry Pi, for example), then a different tab may display on your browser:
On the case, just select a url (the blue part on screenshot), copy and paste it on url column of a new tab. Requesting the url, we could download the file from server to local directory.
(4), http://192.168.0.121:18080/delete: delete file that is stored in server.
When the request is sent to server, such a page shall display on browser:
Supposing on Firefox browser, you could select a url that includes the file to delete in server, press it. The file shall be deleted at server side quickly, then responds the message:
message "The file has been deleted. "
Notice: If you use Chrome/Chromium browser, following the similar steps that we introduced at Step 4.2.(3), you could also delete the file, as well.
Now we have got a test app, a tool, to run CRUD operations for files is DFS. All the operations can be done from browsers out of cluster.
The test app "files", which we developed at Section 4, can be used to verify if the Storage Cluster works or not. Actually we can monitor the Cluster's status via more tools. On the pactice, we will focus on 2 of them:
CLI tool -- Rook ToolBox, and
GUI (Graphical User Interface) tool -- Ceph Dashboard.
Change "working directory" to rook/deploy/examples, modify 1 line in toolbox.yaml:
replicas: 4
Run command below:
$ microk8s kubectl create -f toolbox.yaml
$ microk8s kubectl get all --all-namespaces| grep tool
rook-ceph pod/rook-ceph-tools-555c879675-4pggx 1/1 Running 12 (7h13m ago) 13d
rook-ceph pod/rook-ceph-tools-555c879675-nbgww 1/1 Running 9 (7h12m ago) 12d
rook-ceph pod/rook-ceph-tools-555c879675-qdq8j 1/1 Running 12 (7h12m ago) 13d
rook-ceph pod/rook-ceph-tools-555c879675-sx5gw 1/1 Running 10 (7h12m ago) 12d
rook-ceph deployment.apps/rook-ceph-tools 4/4 4 4 13d
rook-ceph replicaset.apps/rook-ceph-tools-555c879675 4 4 4 13d
microk8s kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
[rook@rook-ceph-tools-555c879675-4pggx /]$
By executing the deployment (app), now we have entered a pod of the deployment.
Typing CLI commands in the pod, to look at how the Tool works:
[rook@rook-ceph-tools-555c879675-4pggx /]$ $ ceph status
cluster:
id: 594241e6-e8af-4260-bb80-13826d6c6ac8
health: HEALTH_OK
services:
mon: 3 daemons, quorum e,f,g (age 7h)
mgr: b(active, since 7h), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 4 osds: 4 up (since 7h), 4 in (since 13d)
data:
volumes: 1/1 healthy
pools: 3 pools, 65 pgs
objects: 31 objects, 465 KiB
usage: 89 MiB used, 57 GiB / 57 GiB avail
pgs: 65 active+clean
io:
client: 853 B/s rd, 1 op/s rd, 0 op/s wr
The commmand reports that there are 4 OSDs, 3 MON daemons, 2 MGR deamons and 2 MDS daemons work, which all just fit to what we set at the previous sections:
Section 3.2 for OSD setting;
Section 3.4 for MON and MGR settings;
Section 3.5 for MDS setting.
So, these messages tell us that Rook Toolbox has been created successfully.
Run command:
$ microk8s kubectl -n rook-ceph get service
...
rook-ceph-mgr-dashboard ClusterIP 10.152.183.116 <none> 8443/TCP 3d5h
...
The service ("rook-ceph-mgr-dashboard") is called Ceph Dashboard, which aims to timely report status of the Storage Cluster in GUI mode.
On browser, type url:
https://10.152.183.116:8443
A page shall display on browser:
Get admin's password via the following way:
$ microk8s kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
Fz$~';6n.UUy2?WZiR1T
Paste the password onto column "password" on browser. A page shall display as soon as you press button "Login":
If you look through the status information, some of which are same to what we got from Rook Toolobx.
So, now we could be quite confident that the Ceph Dashboard is ready to work.
Run:
$ microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.0.100:19001 192.168.0.101:19001 192.168.0.102:19001
datastore standby nodes: 192.168.0.103:19001
...
That means all nodes are working well, so is Microk8s Cluster.
You could also check status for Storage Cluster via Rook Toolbox:
$ microk8s kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
[rook@rook-ceph-tools-555c879675-4pggx /]$ ceph status
cluster:
id: 594241e6-e8af-4260-bb80-13826d6c6ac8
health: HEALTH_OK
services:
mon: 3 daemons, quorum e,f,g (age 26m)
mgr: b(active, since 32m), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 4 osds: 4 up (since 27m), 4 in (since 2w)
...
[rook@rook-ceph-tools-555c879675-4pggx /]$ ceph mgr stat
{
"epoch": 223,
"available": true,
"active_name": "b",
"num_standby": 1
}
[rook@rook-ceph-tools-555c879675-4pggx /]$ ceph mds stat
myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
Along with the information that we got at Section 5.1.1, all the status information shows that Storage Cluster is healthy; its main deamons are working well: 4 OSDs with one daemon per node, 3 MONs working as we set, 1 MGR (mgr.b) as active and the other (mgr.a) as standby, 1 MDS (myfs-b) up and 1 more (myfs-a) hot-standby.
Let's go back to last browser page we got above, to check status information of Storage Cluster via Ceph Dashboard.
On the page, select Cluster -> Hosts, the browser shall displays as:
From the page, we could know: which daemons are working on which host.
For example, at xiaozhong-w540, there are a MDS daemon (mds.myfs-b), a MGR daemon (mgr.b), a MON daemon (mon.b) and an OSD (osd.0) running.
We could monitor some type of daemon status via more ways. For example, select Cluster -> OSDs, the page display as below:
It shows: osd.0 has joined in the cluste and taken up xiaozhong-w540, which occupies disk space 11.2GB...
Refer to Section 4.2, the test app "files" could also be used to verify if all CRUD operations work at the moment.
Now let's power xiaozhong-w540 off, to simulate that some issue occurs with the host. As soon as it is dead, the OSDs status of dashboard will display as:
It shows that host xiaozhong-w540 is still in the cluster but is shut down. Now there are only 3 OSD nodes working in the cluster.
At the moment, if you run test app files, all the CRUD operations for server are still working. They aren't influenced by a node of Storage Cluster being lost as long as 3 OSDs still remains working in the cluster. This shows how High Availability feature works for Storage Cluster.
At the moment, let's check the working status of Microk8s Cluster:
$ microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.0.100:19001 192.168.0.102:19001 192.168.0.103:19001
datastore standby nodes: none
...
The result is the same as what we got at Section 2.3. That is, the Microk8s Cluster keeps working when the node nubmer reduces from 4 to 3 because it is a High Available Kubernetes cluster.
But we could still find some difference with status of Storage Cluster:
Go Cluster -> Monitors, the browser dispalys:
It tells us that mon.k is Not in Quorum now. That is, the monitor daemon isn't effective at the moment. So, there remains 2 MON daemons working. The number does still meet the requirement of Ceph's distributed consensus algorithm called Paxos, which is: "Ceph requires a majority of monitors to be active to establish a quorum (thus establishing consensus)."
So, from the new status due to one node shutdown, we know that 2 deamons stop working, mon.k and osd.0.
What does status report from Rook toolbox?
[rook@rook-ceph-tools-555c879675-vm6wv /]$ ceph status
cluster:
id: 4e90088b-c732-443e-8c9b-c60b51419f27
health: HEALTH_WARN
1/3 mons down, quorum a,l
1 osds down
1 host (1 osds) down
Degraded data redundancy: 16/90 objects degraded (17.778%), 11 pgs degraded, 34 pgs undersized
services:
mon: 3 daemons, quorum a,l (age 4m), out of quorum: k
mgr: a(active, since 48m), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 4 osds: 3 up (since 4m), 4 in (since 4d)
...
[rook@rook-ceph-tools-555c879675-vm6wv /]$ ceph mds stat
myfs:1 {0=myfs-a=up:active} 1 up:standby-replay
Oh, other than 1 MON and 1 OSD down, the deamons for MGR and MDS don't loose anything: Storage Cluster still keeps 2 MGR and 2 MDS deamons working. The only change is that the original active nodes, mgr.b and mds.myfs-b that worked at xiaozhong-w540, have handed over the "active roles to mgr.a and mds.myfs-a.
Move the look at dashboard by selecting Cluster -> Hosts again:
More clearly, the original Standby daemons, mgr.a and mds.myfs-a playing as Avtive now, are keeping running on the same host: xiaozhong-x570.
The original Active damons, mgr.b and mds.myfs-b, are playing Standby role now, and have moved to xiaozhong-giga with little time-inverval.
It's interesting at the finding that mon.k and osd.0 are still marked as staying at xiaozhong-w540 although the host has been powered off. Not 100% sure if it is the expected behavior designed by Ceph Dashboard.
Conclusion: when a host down, its MGR and MDS daemons move to other host and keep working; its OSD won't be marked to move, and MON won't be immediately marked to move.
Now let's power on xiaozhong-w540 again.
The Microk8s Cluster will show that it comes back to status of "3-master-nodes and 1-standby-node" as soon as the host gets stable.
You could check the status for the cluster:
$ microk8s status
microk8s is running
high-availability: yes
datastore master nodes: 192.168.0.100:19001 192.168.0.102:19001 192.168.0.103:19001
datastore standby nodes: 192.168.0.101:19001
Check Cluster -> Hosts on Ceph Dashboard, it shall shows:
No difference has happened from the Host xiaozhong-w540 was down.
Check Cluster -> Monitors:
It shows that mon.k re-works and comes in Quorum again.
If pressing Cluster -> OSDs, the status of osd.0 will come back from down to up. It will restore to normal working status automatically.
It's expected that a High Available app will be implemented probably. Its goal is to publish web app running on our Microk8s/Storage Cluster to Internet. This shall be done by creating an Internet url (including domain name) to map the web app.
If it is done, "files" (web app, for instance) can be visited by remote customer (via browser) over Internet as a SaaS (Software as a Service), which is a typical Service Model of Cloud Computing.
On the meaning, as said, you will be able to provide a rariety of Cloud Services on internet without pausing just based at this Microk8s/Storage Cluster.
Now we have created a cost-effieint cluster system working on multiple machines cooperatively.
It does possess functionalities on Container Orchestration supported by Microk8s (Kubernetes), on Distributed File System supported by Ceph, on Storage Orchstration by Rook; on High Availability by all of them.
All the functionalities are verified and monitored by tools (Rook Toolbox, Ceph Dashboard) and test apps (hello, Files developed by us).