Recently, I encountered a problem where one of the nodes in my OpenShift cluster could not pull container images. In this blog post, I will explain the error messages observed in the pod logs and share my steps to resolve the issue.
Problem Description
During my recent experience, I consistently encountered the following error message in the pod logs on the affected node:
5gc/amf:v1" already present on machine
Warning Failed 2m6s kubelet Error: failed to mount container k8s_ctcs_amf01poc-ctcs-amf_6cbb6705-f713-49bb-a9dc-d6a829ef5337_0(3ec5b69efcd18b0c8ac21152726f4f81bf0e9f58d3fc69b7b72b7b67458b34df): error creating overlay mount to /var/lib/containers/storage/overlay/fb605ed2556f50195e9ed85f9abe2b1308b7e9ac71353f1c45c130b5c9ba5107/merged, mount_data=",lowerdir=/var/lib/containers/storage/overlay/l/BNAOYHSQC3DBUMIKXL5RG6TRES:/var/lib/containers/storage/overlay/l/X6DEYA4UQOLMQU3E432US5WBW7:/var/lib/containers/storage/overlay/l/KLFLBAYNHJW7RCPDDKPQ3WAMO4:/var/lib/containers/storage/overlay/l/EXRWVKXB3QYILGYYOK2XLLJQPV:/var/lib/containers/storage/overlay/l/5MXTZP552GAY6CN7QGIPDQ66W6:/var/lib/containers/storage/overlay/l/J7VG3DG2HX23LJUYKCFRDBROUG:/var/lib/containers/storage/overlay/l/KKAKBJ3E7XAFZUYTWT4PFWKM5D:/var/lib/containers/storage/overlay/l/MH5IWXUCPAJBMCKBNF2ACCCHOW:/var/lib/containers/storage/overlay/l/GBVY73SRIBZ6GUHTGRMMGAJRKN:/var/lib/containers/storage/overlay/l/NSMONE25VILA6J4QBA4OKX7ZRG:/var/lib/containers/storage/overlay/l/W5BM5N2OJDNZZAQPCQJHYORGLJ:/var/lib/containers/storage/overlay/l/6O5N6DXUPJXBJHCJNVTYLHI5CR:/var/lib/containers/storage/overlay/l/2WLVV4WFJQJSX3VCRVILPKKAXI:/var/lib/containers/storage/overlay/l/L5IX6H62SCVRSSBNJB5XWFFJAV:/var/lib/containers/storage/overlay/l/ABW6A5XUCR64XZLGOUDACNPCVB,upperdir=/var/lib/containers/storage/overlay/fb605ed2556f50195e9ed85f9abe2b1308b7e9ac71353f1c45c130b5c9ba5107/diff,workdir=/var/lib/containers/storage/overlay/fb605ed2556f50195e9ed85f9abe2b1308b7e9ac71353f1c45c130b5c9ba5107/work,context=\"system_u:object_r:container_file_t:s0:c29,c19\"": no such file or directory
Solution:
To troubleshoot and resolve this issue, I followed these steps:
I also tried to pull the image using the podman pull command manually. However, I faced the same issue; this suggested that the problem was more comprehensive than the automated image-pulling process in the OpenShift cluster. Instead, it was related to the image or the nodes where the pulling occurred.
Preventing pod scheduling on the affected node:
I used the commandoc adm drain NODENAME --ignore-daemonsets --delete-local-data --disable-eviction --force
to stop scheduling pods on the problematic node.-
Stopping services on the affected node:
I SSHed into the node and issued the following commands to stop the necessary services:-
systemctl stop crio
to stop the CRI-O service. -
systemctl stop kubelet
to stop the kubelet service.
-
Resetting the Podman configuration:
I attempted to reset the Podman configuration using the commandpodman system reset -f
. However, I encountered an error indicating that certain images were still in use by containers.Manually deleting the container data:
I resolved this issue by manually deleting the/var/lib/containers/
directory, which contained the container data. In my case, I had to remove the entire directory, including the "overlay" subdirectory.
When attempting to delete the "overlay" folder, I encountered a "Device or resource busy"
error. To address this, I used the command umount overlay
to unmount the overlay filesystem. Then, I executed crio wipe -f
to wipe the CRI-O configuration.
This fixes the issue.
Top comments (0)