kubeadm 证书过期处理

目录

  1. 免责申明

  2. 说明

  3. renew API server 等证书

    1. kubeadm 配置文件导出

  4. renew master 节点 kubelet 证书

  5. renew 其他节点证书

  6. 参考文档

免责申明

本记录仅可用于测试环境,不保证可完全修复集群,有些操作是自己摸索的不代表正确。

说明

因为经常要做 Kubernetes 相关的测试,于是环境里有很多集群,有时候长时间不开机,集群默认 1 年的证书就过期了,过期后 kubelet 还有 api Server 等都无法正常启动,下面记录下修复过程。

renew API server 等证书

登录 master 节点,使用下列命令直接 renew 所有证书:

然后重启 docker,正常来说 api server 之类的 pod 便可以正常启动。

kubeadm.conf 配置中证书已经被 kubeadm 续期了,证书的部分内容如下:

Subject Name=
====Organization=system:masters
====Common Name=kubernetes-admin

Issuer Name=
====Common Name=kubernetes

Serial Number=5755935322539516670
Version=3
Signature Algorithm=SHA-256 with RSA Encryption ( 1.2.840.113549.1.1.11 )
====Parameters=None

Not Valid Before=Sunday, August 14, 2022 at 18:47:31 China Standard Time
Not Valid After=Saturday, November 9, 2024 at 11:29:52 China Standard Time

使用 kubectl --kubeconfig /etc/kubernetes/admin.conf get node 命令可以看到所有节点。将此 config 复制到用户目录方便使用。

cp /etc/kubernetes/admin.conf ~/.kube/config
kubectl get node

kubeadm 配置文件导出

一般部署 kubeadm 时都会准备一个用于初始化的 kubeadm config 文件,文件中包含集群自定义的配置,比如:

# cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
  podSubnet: 10.39.0.0/16,2001::/64
  serviceSubnet: 10.96.0.0/16,2002::/110
controllerManager:
  extraArgs:
    "node-cidr-mask-size-ipv4": "25"
    "node-cidr-mask-size-ipv6": "80"
imageRepository: "registry.cn-hangzhou.aliyuncs.com/google_containers"
clusterName: "rhel-k8scluster"
kubernetesVersion: "v1.22.11"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: "10.29.8.171"
  bindPort: 6443
nodeRegistration:
  kubeletExtraArgs:
    node-ip: 10.29.8.171,2000::171

如果集群部署时用的全部是默认配置,则 kubeadm 可能使用默认的配置,当 API Server 可用后,需要从下列 configmap 中导出 kubeadm-config.yaml 文件:

kubectl get cm kubeadm-config -n kube-system -o json | jq -r '.data.ClusterConfiguration' > kubeadm-config.yaml
# 导出时只需要保留 data.ClusterConfiguration 中的内容

renew master 节点 kubelet 证书

之后 systemctl restart kubelet,观察到下列错误,提示 kubelet 证书过期:

Nov 09 22:05:57 master01 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 09 22:05:57 master01 kubelet[2712]: Flag --network-plugin has been deprecated, will be removed along with dockershim.
Nov 09 22:05:57 master01 kubelet[2712]: Flag --network-plugin has been deprecated, will be removed along with dockershim.
Nov 09 22:05:57 master01 systemd[1]: Started Kubernetes systemd probe.
Nov 09 22:05:57 master01 kubelet[2712]: I1109 22:05:57.184996    2712 server.go:440] "Kubelet version" kubeletVersion="v1.22.11"
Nov 09 22:05:57 master01 kubelet[2712]: I1109 22:05:57.185198    2712 server.go:868] "Client rotation is on, will bootstrap in background"
Nov 09 22:05:57 master01 kubelet[2712]: E1109 22:05:57.186388    2712 bootstrap.go:265] part of the existing bootstrap client certificate in /etc/kubernetes/kubelet.conf is expired: 2023-08-14 10:47:33 +0000 UTC
Nov 09 22:05:57 master01 kubelet[2712]: E1109 22:05:57.186411    2712 server.go:294] "Failed to run kubelet" err="failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory"
Nov 09 22:05:57 master01 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 09 22:05:57 master01 systemd[1]: Unit kubelet.service entered failed state.
Nov 09 22:05:57 master01 systemd[1]: kubelet.service failed.

/etc/kubernetes/kubelet.conf 中指定了 CA 证书信息以及 kubelet 使用的证书和秘钥,CA 证书是 10 年的还未过期,kubelet 证书已过期,需要修复。

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1EZ3hOREV3TkRjek1Wb1hEVE15TURneE1URXdORGN6TVZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTVJOCi9Nb1hnK2dCNTBZa0IvOXNlaWlPSjdOaXRETmRyS2thK2E0NUo3amg3OThuZjdSN2ZMYlBjTkxyZFgyN3lUT2wKZ1dKMTdCN1B3alNIWXZXejVGbGMyS3NBelFHaXlyOXZ3cDhUcnZVQUdRSkFaUllJMmcrYTFjWnZGSXl5SkdYdQpoSkIxcUQrN2ZPYWVCWkFYZ00wb0NvZjEvVVNIU1BUWGpVRjY4a09lUFUyS2RvSUthZWE5eXF5a3oxN1M0MWZhClFNS0hmVHZHbDJNMUg0SGU4WEhQMUVoV1dDai9MMEs0SDRIaENJYVV4MmdBZ3UwZy9oSDUyRFQ3anRnZG5KQWQKUUczdWxWOVdFeFVrSjlnNUZISEVCR1lDVmlKUkFGR011RVQyMmlFYlI3VWx0RDlaOGlBQ0x3ZURGOHJUdmN1bgpOSUlocXMwaTVCMkFlYitLZUZrQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZNVzJlZzgzM0I5SXI1b2I2cnF2T2Jqd2VIek1NQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBQWw3RjR4bVo3cFkzeGJWODZaYQpTUnBkR054Z1Jka3VXNkFCOUhReHhlbFhDUUUwZmFoU3FCQTZ2blk2NEJwbFBOYURsdmNHNVFPZXNENVhaTXVCCjRzRGUvMjFYUU4rL215TEFaUmcyaDJLZ3l4ZUZuUkJqTWplOFRkQXVTWGxSN3hOVkdYSDhLU1VxOXc3dFN4QWUKUGRIWEozYTZiUkRhc2NhdWNmZFIvUy9INGZaa0RPS1picGl1UUk3RGRTQkEyVVJTeHBjL25lRnl2eVJ6djNGbQpiN1FCZ1RmdEpRU3NHQXArR2NCU29uaUVZYzdJSFJzaGk3ZWNLV3hiU2swZi9ZL1Z1eFptUUxYMWp0cVpuSFNwCjBmZXoyNW1MYWZ2Ri9Ba04wTkNhZldCb3Z3VFg3RWk3SGRHV0tsRGVXMDJTVkVidzdWR0h5UUZ0UWJPVDBYSXcKbVhjPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    server: https://10.29.8.171:6443
  name: rhel-k8scluster
contexts:
- context:
    cluster: rhel-k8scluster
    user: system:node:master01
  name: system:node:master01@rhel-k8scluster
current-context: system:node:master01@rhel-k8scluster
kind: Config
preferences: {}
users:
- name: system:node:master01
  user:
    client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
    client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

先对原来的文件进行备份:

mkdir ~/bak
cp /etc/kubernetes/kubelet.conf ~/bak
cp /var/lib/kubelet/pki/kubelet-client* ~/bak

使用 kubeadm 生成新的 kubelet.conf,此配置会自动附加新的 kubelet 证书:

kubeadm kubeconfig user --org system:nodes --client-name system:node:master01 --config=kubeadm-config.yaml > /etc/kubernetes/kubelet.conf

# 重启 kubelet 服务
systemctl restart kubelet

之后查看到 master 节点为 Ready:

[root@master01 ~]# kubectl get node
NAME       STATUS                        ROLES                  AGE    VERSION
master01   Ready                         control-plane,master   452d   v1.22.11

renew 其他节点证书

接着使用相似操作,在 master 节点(包含 ca 根证书的节点)为其他 worker 节点创建 kubelet 配置文件:

kubeadm kubeconfig user --org system:nodes --client-name system:node:worker01 --config=kubeadm-config.yaml > kubelet-worker01.conf
kubeadm kubeconfig user --org system:nodes --client-name system:node:worker02 --config=kubeadm-config.yaml > kubelet-worker02.conf
... ...

创建完后依次 scp 到相应 worker 节点,然后重启 kubelet 服务:

scp kubelet-worker01.conf :/etc/kubernetes/kubelet.conf
scp kubelet-worker02.conf :/etc/kubernetes/kubelet.conf
... ...

systemctl restart kubelet

最终可以看到所有节点恢复正常:

[root@master01 ~]# kubectl get node
NAME       STATUS                     ROLES                  AGE    VERSION
master01   Ready                      control-plane,master   452d   v1.22.11
worker01   Ready                      <none>                 452d   v1.22.11
worker02   Ready                      <none>                 452d   v1.22.11
worker03   Ready                      <none>                 452d   v1.22.11
worker04   Ready                      <none>                 452d   v1.22.11
worker05   Ready                      <none>                 255d   v1.22.11
worker06   Ready                      <none>                 253d   v1.22.11
worker07   Ready                      <none>                 168d   v1.22.11

参考文档

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#kubelet-client-cert