1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
| # 0. This guide is for upgrading from Rook v1.9.x to Rook v1.10.x.
## rook:v1.9.4
## ceph:v16.2.9
## cephcsi:v3.6.1
# 1. Health Verification
## 1.1 Pods all Running
### In a healthy Rook cluster, all pods in the Rook namespace should be in the Running (or Completed) state and have few, if any, pod restarts.
export ROOK_OPERATOR_NAMESPACE=rook-ceph
export ROOK_CLUSTER_NAMESPACE=rook-ceph
kubectl -n $ROOK_CLUSTER_NAMESPACE get pods
## 1.2 Status Output
### The Rook toolbox contains the Ceph tools that can give you status details of the cluster with the ceph status command. Let's look at an output sample and review some of the details:
TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
# cluster:
# id: a3f4d647-9538-4aff-9fd1-b845873c3fe9
# health: HEALTH_OK
# services:
# mon: 3 daemons, quorum b,c,a
# mgr: a(active)
# mds: myfs-1/1/1 up {0=myfs-a=up:active}, 1 up:standby-replay
# osd: 6 osds: 6 up, 6 in
# rgw: 1 daemon active
# data:
# pools: 9 pools, 900 pgs
# objects: 67 objects, 11 KiB
# usage: 6.1 GiB used, 54 GiB / 60 GiB avail
# pgs: 900 active+clean
# io:
# client: 7.4 KiB/s rd, 681 B/s wr, 11 op/s rd, 4 op/s wr
# recovery: 164 B/s, 1 objects/s
### In the output above, note the following indications that the cluster is in a healthy state:
### Cluster health: The overall cluster status is HEALTH_OK and there are no warning or error status messages displayed.
### Monitors (mon): All of the monitors are included in the quorum list.
### Manager (mgr): The Ceph manager is in the active state.
### OSDs (osd): All OSDs are up and in.
### Placement groups (pgs): All PGs are in the active+clean state.
### (If applicable) Ceph filesystem metadata server (mds): all MDSes are active for all filesystems
### (If applicable) Ceph object store RADOS gateways (rgw): all daemons are active
## 1.3 Container Versions
### The container version running in a specific pod in the Rook cluster can be verified in its pod spec output. For example, for the monitor pod mon-b we can verify the container version it is running with the below commands:
POD_NAME=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -o custom-columns=name:.metadata.name --no-headers | grep rook-ceph-mon-b)
kubectl -n $ROOK_CLUSTER_NAMESPACE get pod ${POD_NAME} -o jsonpath='{.spec.containers[0].image}'
### The status and container versions for all Rook pods can be collected all at once with the following commands:
kubectl -n $ROOK_OPERATOR_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0]}{"\n"}{end}' && \
kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0].image}{"\n"}{end}'
### The rook-version label exists on Ceph resources. For various resource controllers, a summary of the resource controllers can be gained with the commands below. These will report the requested, updated, and currently available replicas for various Rook resources in addition to the version of Rook for resources managed by Rook. Note that the operator and toolbox deployments do not have a rook-version label set.
kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
kubectl -n $ROOK_CLUSTER_NAMESPACE get jobs -o jsonpath='{range .items[*]}{.metadata.name}{" \tsucceeded: "}{.status.succeeded}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
## 1.4 Rook Volume Health
### Any pod that is using a Rook volume should also remain healthy:
### The pod should be in the Running state with few, if any, restarts
### There should be no errors in its logs
### The pod should still be able to read and write to the attached Rook volume.
# 2. Rook Upgrades
## 2.1 Breaking changes in v1.10
### Support for Ceph Octopus (15.2.x) was removed. If you are running v15 you must upgrade to Ceph Pacific (v16) or Quincy (v17) before upgrading to Rook v1.10
### The minimum supported version of Ceph-CSI is v3.6.0. You must update to at least this version of Ceph-CSI before or at the same time you update the Rook operator image to v1.10
### Before upgrading to K8s 1.25, ensure that you are running at least Rook v1.9.10, or v1.10.x. If you upgrade to K8s 1.25 before upgrading to v1.9.10 or newer, the Helm chart may be blocked from upgrading to newer versions of Rook. See #10826 for a possible workaround.
## 2.2 Patch Release Upgrades
### Unless otherwise noted due to extenuating requirements, upgrades from one patch release of Rook to another are as simple as updating the common resources and the image of the Rook operator. For example, when Rook v1.10.12 is released, the process of updating from v1.10.0 is as simple as running the following:
git clone --single-branch --depth=1 --branch v1.10.12 https://github.com/rook/rook.git
cd rook/deploy/examples
kubectl apply -f common.yaml -f crds.yaml
# kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.10.12
# 修改 operator.yaml
ROOK_CSI_REGISTRAR_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.7.0"
ROOK_CSI_RESIZER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-resizer:v1.7.0"
ROOK_CSI_PROVISIONER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-provisioner:v3.4.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-snapshotter:v6.2.1"
ROOK_CSI_ATTACHER_IMAGE: "registry.cn-hangzhou.aliyuncs.com/google_containers/csi-attacher:v4.1.0"
kubectl apply -f operator.yaml
kubectl apply -f toolbox.yaml
## 2.3 Wait for the upgrade to complete
watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
# This cluster is not yet finished:
# rook-version=v1.9.13
# rook-version=v1.10.12
# This cluster is finished:
# rook-version=v1.10.12
## 2.4 Verify the updated cluster
TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
# 3. Ceph Upgrades
## 3.1 Supported Versions
### Rook v1.10 supports the following Ceph versions:
### Ceph Quincy v17.2.0 or newer
### Ceph Pacific v16.2.0 or newer
## 3.2 Update the Ceph daemons
### The upgrade will be automated by the Rook operator after you update the desired Ceph image in the cluster CRD (spec.cephVersion.image).
ROOK_CLUSTER_NAMESPACE=rook-ceph
NEW_CEPH_IMAGE='quay.io/ceph/ceph:v17.2.5-20221017'
kubectl -n $ROOK_CLUSTER_NAMESPACE patch CephCluster $ROOK_CLUSTER_NAMESPACE --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"
## 3.3 Wait for the pod updates
### As with upgrading Rook, you must now wait for the upgrade to complete. Status can be determined in a similar way to the Rook upgrade as well.
watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'
### Confirm the upgrade is completed when the versions are all on the desired Ceph version.
kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq
This cluster is not yet finished:
ceph-version=15.2.13-0
ceph-version=v17.2.5-0
This cluster is finished:
ceph-version=v17.2.5-0
## 3.4 Verify cluster health
TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
|