I encountered a bizarre error when using Quay to pull a docker image from a private repository. You can find the complete log below,
__ __
/ \ / \ ______ _ _ __ __ __
/ /\ / /\ \ / __ \ | | | | / \ \ \ / /
/ / / / \ \ | | | | | | | | / /\ \ \ /
\ \ \ \ / / | |__| | | |__| | / ____ \ | |
\ \/ \ \/ / \_ ___/ \____/ /_/ \_\ |_|
\__/ \__/ \ \__
\___\ by Red Hat
Build, Store, and Distribute your Containers
Startup timestamp:
Fri Oct 7 21:08:49 UTC 2022
Running all default registry services without migration
Running init script '/quay-registry/conf/init/certs_create.sh'
Generating a RSA private key
............................................................................................................................................++++
.................................................................++++
writing new private key to 'mitm-key.pem'
-----
Running init script '/quay-registry/conf/init/certs_install.sh'
Installing extra certificates found in /quay-registry/conf/stack/extra_ca_certs directory
Running init script '/quay-registry/conf/init/copy_config_files.sh'
Running init script '/quay-registry/conf/init/d_validate_config_bundle.sh'
Validating Configuration
plpgsql
pg_trgm
...
...
| DistributedStorage | Could not connect to storage local_us. Error: Get "https://s3.openshift-storage.svc.cluster.local/quay-datastore-b84f9a69-e025-4a53-950e-75077ee64430/?location=": net/http: timeout awaiting response headers
Quay seems unable to connect with nooba data storage for some reason. The component is located in the openshift-storage namespace. Let's see whether any pods are not running in the openshift-storage project as follows,
oc get pods -o custom-columns="POD:metadata.name,STATE:status.containerStatuses[*].state.waiting.reason" -n openshift-storage| grep -v "<none>"
Output:
POD STATE
csi-rbdplugin-provisioner-5cdf488d6f-h84bq CrashLoopBackOff
csi-rbdplugin-provisioner-5cdf488d6f-mlxrw CrashLoopBackOff
noobaa-db-pg-0 0/1 Init:0/2 0 24m
Some of the pods are in an unhealthy state. In particular, the "nooba" pod seems to be down., which quay trying to connect as object storage. When I describe the pod, this is what I found:
Warning FailedMount 17m (x2 over 21m) kubelet Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[kube-api-access-jvkcl noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume db]: timed out waiting for the condition
Warning FailedMount 7m17s (x5 over 19m) kubelet Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-api-access-jvkcl noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition
Warning FailedAttachVolume 3m37s (x9 over 21m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-bd18086f-de8f-469f-96e3-39c3566cb811" : Attach timeout for volume 0001-0011-openshift-storage-0000000000000001-74044298-fed1-11ec-86e3-0a580a830015
Warning FailedMount 3m10s kubelet Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume db kube-api-access-jvkcl]: timed out waiting for the condition
Warning FailedMount 67s (x3 over 15m) kubelet Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-config-volume db kube-api-access-jvkcl noobaa-postgres-initdb-sh-volume]: timed out waiting for the condition
We can see that there is a problem with our storage controller, specifically with the rbd-plugin-provisioner. This is supposed to handle block devices type of persistent volume, which nooba is using.
So I checked the log of csi-rbdplugin-provisioner
pod,
Note: I am using kubetail, check the repo for this log
kubetail csi-rbdplugin-provisioner-5cdf488d6f-h84bq
Will tail 6 logs...
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-provisioner
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-resizer
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-attacher
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-snapshotter
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-rbdplugin
csi-rbdplugin-provisioner-5cdf488d6f-h84bq liveness-prometheus
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq liveness-prometheus] W1010 15:14:43.023744 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-snapshotter] W1010 15:14:42.278799 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-provisioner] W1010 15:14:41.173525 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-attacher] W1010 15:14:41.853546 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-resizer] W1010 15:14:41.496197 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
All of the containers saying it is trying to connect to unix:///csi/csi-provisioner.sock
. For some reason, csi-rbdplugin
container is not able to set up this socket.
Is it possible that the container cannot generate this socket because of an SCC permission? To test this, I added privileged
permission to the service account rook-csi-rbd-provisioner-sa
.
oc adm policy add-scc-to-user privileged system:serviceaccount:openshift-storage:rook-csi-rbd-provisioner-sa
Afterward, I added a Security Context privileged: true
at the pod level so that all containers would have enough privileges. For some reason, this started a new issue; I saw the following ERROR !!!
Warning FailedScheduling 3m23s (x1 over 4m24s) default-scheduler 0/13 nodes are available: 10 node(s) didn't match pod affinity/anti-affinity rules, 10 node(s) didn't match pod anti-affinity rules, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Not sure why, but instead of debugging this new error, I decided to delete the csi-rbdplugin-provisioner
deployment entirely and reinstall it again with the privileged flag disabled. This time I saw the following error message
kubetail csi-rbdplugin-provisioner-67f9478588-t7g82
Will tail 6 logs...
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-provisioner
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-resizer
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-attacher
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-snapshotter
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin
csi-rbdplugin-provisioner-67f9478588-t7g82 liveness-prometheus
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] I1010 16:01:51.368844 1 cephcsi.go:131] Driver version: release-4.8 and Git version: ad563f5bebb2efd5f64dee472e441bbe918fa101
[csi-rbdplugin-provisioner-67f9478588-t7g82 liveness-prometheus] W1010 16:01:54.557645 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-attacher] W1010 16:01:53.606806 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] I1010 16:01:51.369091 1 cephcsi.go:149] Initial PID limit is set to 1024
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] E1010 16:01:51.369149 1 cephcsi.go:153] Failed to set new PID limit to -1: open /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod09c3db5b_d5e6_4d85_aea0_298e59a91356.slice/crio-b62add795b28df7c9d62b93b65362c51c0d2b873d3d26631f29435d2c7f0b458.scope/pids.max: permission denied
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-resizer] W1010 16:01:53.341694 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] I1010 16:01:51.369165 1 cephcsi.go:176] Starting driver type: rbd with name: openshift-storage.rbd.csi.ceph.com
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] F1010 16:01:51.369210 1 driver.go:107] failed to write ceph configuration file (open /etc/ceph/ceph.conf: permission denied)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] goroutine 1 [running]:
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.stacks(0xc000010001, 0xc000416270, 0x83, 0xc7)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.(*loggingT).output(0x2b3c140, 0xc000000003, 0x0, 0x0, 0xc0002d5f10, 0x2170e00, 0x9, 0x6b, 0x41a900)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/v2/klog.go:975 +0x191
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.(*loggingT).printDepth(0x2b3c140, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1, 0xc000712c60, 0x1, 0x1)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/v2/klog.go:732 +0x16f
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.FatalDepth(...)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1488
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] github.com/ceph/ceph-csi/internal/util.FatalLogMsg(0x1b5df18, 0x2c, 0xc000615d10, 0x1, 0x1)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-snapshotter] W1010 16:01:53.883701 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/internal/util/log.go:58 +0x118
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] github.com/ceph/ceph-csi/internal/rbd.(*Driver).Run(0xc000615f18, 0x2b3c040)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/internal/rbd/driver.go:107 +0xa5
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] main.main()
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/cmd/cephcsi.go:182 +0x345
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-provisioner] W1010 16:01:53.086761 1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] goroutine 6 [chan receive]:
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.(*loggingT).flushDaemon(0x2b3c140)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1169 +0x8b
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] created by k8s.io/klog/v2.init.0
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/v2/klog.go:417 +0xdf
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] goroutine 99 [chan receive]:
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog.(*loggingT).flushDaemon(0x2b3bf60)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/klog.go:1010 +0x8b
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] created by k8s.io/klog.init.0
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] /remote-source/app/vendor/k8s.io/klog/klog.go:411 +0xd8
The log indicates that the csi-rbdplugin
container is having difficulty creating a CSI socket. I resolved the problem by setting a privileged flag for this container.
securityContext:
privileged: true
image: registry.redhat.io/ocs4/cephcsi-rhel8@sha256:502b5da53fae7dd22081717dc317e4978f93866b3c297bac36823571835320f3
imagePullPolicy: IfNotPresent
name: csi-rbdplugin
Top comments (0)