Common problems with IBM z/OS Container Platform

Use this information to help diagnose common problems that are found when you use IBM® z/OS® Container Platform (zOSCP).

Source image is rejected

  • If you try to pull an image and the source image is rejected, you might see an error message containing the text Running image <image name> is rejected by policy. For example:
    Error: Source image rejected: Running image docker://icr.io/zoscp/zos:latest is rejected by policy.

    Podman for IBM z/OS (Podman), cri-o for IBM z/OS (cri-o), and IBM z/OS for Skopeo (Skopeo) require a default trust policy to be defined to understand which images are acceptable to pull.

    To define a trust policy, the following command can be issued to trust images from the IBM Cloud Container Registry:
    podman image trust set -t accept icr.io
    For more information, see Pushing to and pulling from a container registry.

    See Trusting external container image registries, for an example of how to establish trust in order to accept the image from your internal registry.

x509 certificate signed by unknown authority

  • If you try to securely connect to an image registry with Skopeo, without setting up the location of an x509 root CA certificate, you may encounter an error message containing the text certificate signed by unknown authority. For example:
    $ skopeo inspect docker://icr.io/zoscp/ibm-semeru-runtimes:certified-17-jdk-zos
    FATA[0000] Error parsing image name "docker://icr.io/zoscp/ibm-semeru-runtimes:certified-17-jdk-zos": pinging container registry icr.io: 
    Get "https://icr.io/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority
    To resolve the problem, you need to set up TLS. For more information, see Set up TLS to securely connect to image registries.

JVMSHRC245E Error mapping shared class cache file

  • If you see a Error mapping shared class cache file error when using Podman, for example:

    JVMSHRC245E Error mapping shared class cache file 
    JVMSHRC336E Port layer error code = -155 JVMSHRC337E Platform error message: EDC5132I Not enough memory. 
    JVMSHRC840E Failed to start up the shared cache. JVMSHRC686I Failed to startup shared class cache. 
    Continue without using it as -Xshareclasses:nonfatal is specified 

    This error does not prevent Java™ from running, but might cause runtime performance degradation that is caused by the lack of a shared class cache.

    You need to ensure that your system has the SMFLIMxx parmlib updates. This is a requirement for the ibm-semeru-runtimes:certified-17-jdk-zos container image in order to support caches mapped above the 2 GB address range. The maximum size of these caches are limited by the MAXSHARE value within the SMFLIMxx PARMLIB member. For more information, see Container image storage requirements.

EDC5133I No space left on device

  • If you see a No space left on device error when trying to remove containers using Podman, for example:

    $ podman --log-level debug rm -a
    ...
    DEBU[0001] Using tmp dir /var/run/libpod
    ...
    Error: Unable to write container exited event: "write /var/run/libpod/events/events.log: EDC5132I No space left on device."
    It may mean that the events.log file in the tmp directory is causing the filesystem to be full. You can specify a new path under TFS for tmp_dir in the /etc/containers/containers.conf file for temporary files.
    # Directory for temporary files. Must be tmpfs (wiped after reboot)
    #
    #tmp_dir = “/var/run/libpod"
    
    To ensure that the new value for tmp_dir takes effect and is not overridden, you need to remove the db.sql file under the current graph root.
    $ podman info | grep graphRoot:
      graphRoot: /SYSTEM/var/lib/containers/storage
    $ rm /SYSTEM/var/lib/containers/storage/db.sql
    

    For more information, see Storage requirements.

  • If you get an error similar to the following when using Podman:
    $ podman rm -af
    Error: cleaning up storage: removing container 2e777fc138d8ef84022bc62a44572150963a502a5775d3bf362d6a3d979eeb12 
    root filesystem: write /var/lib/podman/storage/ufs-layers/.tmp-layers.json1865200036: EDC5133I No space left on device.
    It may mean that the ZFS filesystem (mounted as /var/lib/podman ) for Podman has become unstable due to reaching its capacity.
    $ cd /var/lib/podman
    $ df -Pkv 
    Filesystem 1024-blocks Used Available Capacity Mounted on
    OMVSSPA.SVT.SA.VAR.LIB.PODMAN.ZFS 4193280 4193193 87 100% /J7D/var/lib/podman
    ZFS, Read/Write, Device:220, ACLS=Y
    File System Owner : J7D Automove=U Client=N
    Filetag : T=off codeset=0
    Aggregate Name : OMVSSPA.SVT.SA.VAR.LIB.PODMAN.ZFS

    To resolve this error, you need to increase the size of the ZFS using bpxwmigf. Avoid unmounting file systems that are used by zOSCP, this can cause problems for zOSCP programs.

Container image exists in local storage but may be corrupted

  • If you try to list images and you get:
    $ podman images
    ERRO[0000] Image ed6a9832ff9b exists in local storage but may be corrupted (remove the image to resolve the issue): layer not known
    ERRO[0000] retrieving label for image "f05ce1cae7ee45e60c7d8902dd337c1ead07fd69daa7d76ed58d56477916a937": you may need to remove the image to resolve the error: layer not known
    
    1. You need to remove the failing images:
      $ podman rmi  ed6a9832ff9b f05ce1cae7ee45e60c7d8902dd337c1ead07fd69daa7d76ed58d56477916a937
      WARN[0000] Failed to determine if an image is a parent: layer not known, ignoring the error
      WARN[0000] Failed to determine parent of image: layer not known, ignoring the error
      WARN[0000] Failed to determine if an image is a parent: layer not known, ignoring the error
      WARN[0000] Failed to determine parent of image: layer not known, ignoring the error
      Untagged: localhost/test:latest
      Deleted: ed6a9832ff9bf91e5988186dcd0f209dd9afa27cd5ff18eefe94d141c1806075
      Deleted: f05ce1cae7ee45e60c7d8902dd337c1ead07fd69daa7d76ed58d56477916a937
    2. A middleware system programmer may have removed the IBM provided images in the shared space in the internal registry. If you need the local images to be re-built, speak to your middleware system programmer to pull the base images again.

Instructions fail to set permissions for the target directory when building a Containerfile

  • When using Podman to build a Containerfile with ADD/COPY instructions with the --chmod option specified, the instruction fails to set permissions for the target directory.

    The ADD/COPY instruction uses the contents of the source directory only to populate the target directory. If the target directory does not exist it will be created but only the new contents of the target directory will respect the characteristics specified on the --chmod. The following example shows the impact of using the --chmod option but neither of the target directories in the example respects the --chmod option:

    host $ mkdir source-dir
    
    host $ touch source-dir/1.txt source-dir/2.txt
    
    host $ cat Containerfile
    FROM zos:latest
    COPY source-dir /target-dir
    COPY --chmod=777 source-dir /target-dir-777
    
    host $ podman build -f Containerfile -t test-chmod
    STEP 1/3: FROM zos:latest
    STEP 2/3: COPY source-dir /target-dir
    --> 4d4146942ed
    STEP 3/3: COPY --chmod=777 source-dir /target-dir-777
    COMMIT test-chmod
    --> 77964efafaa
    Successfully tagged localhost/test-chmod:latest
    77964efafaa5fcbf203ff32045266b88f0e800fba162f930df37dc20e958074b
    
    host $ podman run --rm -i -t --entrypoint=/bin/sh test-chmod
    $ ls -l
    total 152
    drwxr-xr-x   2 BPXROOT  OMVS        8192 Sep 14 14:41 bin
    drwxr-xr-t   2 ZUSER1   TSOUSER    20480 Nov 27 14:47 dev
    drwxr-xr-x   2 ZUSER1   TSOUSER     8192 Nov 27 14:47 etc
    drwxrwxrwx   9 ZUSER1   TSOUSER        0 Nov 27 14:47 proc
    drwxr-xr-x   2 ZUSER1   TSOUSER     8192 Nov 27 14:47 run
    drwxr-xr-x   2 ZUSER1   TSOUSER     8192 Nov 27 14:41 target-dir
    drwxr-xr-x   2 ZUSER1   TSOUSER     8192 Nov 27 14:41 target-dir-777
    drwxrwxrwt   2 ZUSER1   TSOUSER     8192 Nov 27 14:47 tmp
    drwxr-xr-x   4 BPXROOT  OMVS        8192 Sep 14 14:41 usr
    
    $ ls -al target-dir*
    target-dir:
    total 32
    drwxr-xr-x   2 ZUSER1   TSOUSER     8192 Nov 27 14:41 .
    drwxr-xr-x   5 ZUSER1   TSOUSER     8192 Nov 27 14:41 ..
    -rw-------   1 ZUSER1   TSOUSER        0 Nov 27 14:39 1.txt
    -rw-------   1 ZUSER1   TSOUSER        0 Nov 27 14:39 2.txt
    
    target-dir-777:
    total 32
    drwxr-xr-x   2 ZUSER1   TSOUSER     8192 Nov 27 14:41 .
    drwxr-xr-x   5 ZUSER1   TSOUSER     8192 Nov 27 14:41 ..
    -rwxrwxrwx   1 ZUSER1   TSOUSER        0 Nov 27 14:39 1.txt
    -rwxrwxrwx   1 ZUSER1   TSOUSER        0 Nov 27 14:39 2.txt

Errors when using Podman to pull an image

  • If you try to pull an image from your internal registry without the correct authority, you might see the following output:
    ZUSER1:/u/user1 #>podman pull <internal-registry-location>/ibm-semeru-runtimes:certified-17-jdk-zos -\\-tls-verify=false         
    Trying to pull <internal-registry-location>/ibm-semeru-runtimes:certified-17-jdk-zos...
    Getting image source signatures
    Copying blob 2f2baac5a799 done  
    Copying blob db4a75b7aa56 skipped: already exists  
    Error: writing blob: adding layer with blob "sha256:2f2baac5a7999d01d89bfd6408cc832c68f9dcc703c2a51a77e604a6f6005c04": 
    lsetxattr /u/user2/.local/share/containers/storage/ufs/9d31a5094916825ba740cc9766e2cf003012cfe305fa6c9c28678585811f916c/diff/usr/lpp/java/J8.0_64/bin/appletviewer: 
    EDC5139I Operation not permitted.
    A middleware system programmer must pull the IBM provided images to a shared space in the internal registry. If you need an image to be pulled, speak to your middleware system programmer. For more information on user ID authorization, see User ID requirements.

EDC5111I Permission denied

  • If you get an error similar to the following when using Podman:
    $ podman system info
    Error: creating runtime static files directory: mkdir /containers-storage-user-257: EDC5111I Permission denied
    .It may be because of the erroneous $TMPDIR in the sample storage.conf and TMPDIR is not set. Update /etc/containers/storage.conf to replace it with /tmp, for example:
    rootless_storage_path = "/tmp/containers-storage-user-$UID"
    Or set TMPDIR=/tmp for Podman.

  • If you get an error similar to the following when trusting an external container registry:
    $ podman image trust set --type accept icr.io 
    Error: open /etc/containers/policy.json: EDC5111I Permission denied
    Ensure that you are trusting the external container registry using the IMGADMIN user ID. For more information, see Trusting external container image registries.

Errors when using Podman to build an image

  • When a new user is connected to the PODMAN group and tries to build an image using Podman, the user might see the following error on the command line:
    $ podman build -t javatest .
    STEP 1/1: FROM ibm-semeru-runtimes:certified-17-jdk-zos
    ERRO[0000] Unmounting /tmp/containers-storage-user-245/ufs/e5f9485bfbeb0cc7d96d928563a26f568a9c5aaa2c44b0ef5a31e3c44af01ead/merged: EDC5121I Invalid argument. (errno2=0xC943014A)
    Error: mounting new container: mounting build container "81038e283a9590e9a6297c70a260bf6063f8496de9d312c904e6aef9baba7530": creating ufs mount to /tmp/containers-storage-user-245/ufs/e5f9485bfbeb0cc7d96d928563a26f568a9c5aaa2c44b0ef5a31e3c44af01ead/merged, mount_data="lowerdir=/var/share/containers/storage/ufs/l/CBRLXMBXVD34SYZCW75U2QJQLU:/var/share/containers/storage/ufs/l/YG3I6MGZIU5RABD6IG3FOKJN72,upperdir=/tmp/containers-storage-user-245/ufs/e5f9485bfbeb0cc7d96d928563a26f568a9c5aaa2c44b0ef5a31e3c44af01ead/diff,workdir=/tmp/containers-storage-user-245/ufs/e5f9485bfbeb0cc7d96d928563a26f568a9c5aaa2c44b0ef5a31e3c44af01ead/work,metacopy=off,supercopy=on": EDC5139I Operation not permitted. (errno2=0x119B00B0)

    During this failure, message ICH408I appears in the z/OS system log stating that the user has 'INSUFFICIENT AUTHORITY TO MOUNTSETUID'.

    To resolve this error, the user that was recently added to the PODMAN group needs to log out and log in again.

Errors when compiling a Java application within a container

  • If, when compiling a Java application within a container, you get an error code JVMJ9GC020E this might be due to your address space limits inside the container:
    STEP 3/4: RUN mkdir /app && javac -encoding iso8859-1 -d /app src/hello/HelloWorld.java
    JVMJ9GC020E -Xms too large for heap
    JVMJ9VM015W Initialization error for library j9gc29(2): Failed to initialize
    Error: Could not create the Java Virtual Machine.
    Error: A fatal exception has occurred. Program will exit.
    Error: building at STEP "RUN mkdir /app && javac -encoding iso8859-1 -d /app src/hello/HelloWorld.java": while running runtime: exit status 1

    By default, the MAXASSIZE value is set in the BPXPRM00 member for zOSCP. To resolve this error, you need to increase the MAXASSIZE value. For more information, see MAXASSIZE in IBM z/OS documentation.

Encoding issues when using ssh

  • If, when using ssh, potential text encoding/conversion issues are encountered, the following commands may help resolve for subsequent commands:
    export _BPXK_AUTOCVT=ON
    chtag -tc1047 /proc/self/fd/0 /proc/self/fd/1 /proc/self/fd/2

Errors when using Podman in a CINET environment

If you are running Podman in a CINET environment and have not properly established stack affinity you may see one of the following errors:
  • $ podman run <image>
    Error: OCI runtime error: runc: time="2024-02-08T16:44:43Z" level=fatal msg="nsexec-1[188]: 
    failed to unshare remaining namespaces: EDC5121I Invalid argument. 
    (errno2=0x12C206DA)
    "time="2024-02-08T16:44:43Z" level=fatal msg="nsexec-0[187]: failed to sync with stage-1: 
    next state: EDC5137I Inappropriate I/O control operation. (errno2=0x05FC0119)
    "time="2024-02-08T16:44:43Z" level=error msg="runc create failed: unable to start container 
    process: can't get final child's PID from pipe: EOF"
  • $ podman run <image>
    WARN[0000] Failed to load cached network config: network zos_hybrid_network not found in 
    CNI cache, falling back to loading network zos_hybrid_network from disk WARN[0000] 1 error 
    occurred: * plugin type="zos-cni" failed (delete): cni plugin zos-cni failed: 
    {"cniVersion": "0.4.0","code": 3,"msg": "Container unknown or does not exist.","details": 
    "No DVIPA for Container found or unexpected internal error. Contact IBM, errno 157, errnojr 766c7307."} 
    Error: plugin type="zos-cni" failed (add): cni plugin zos-cni failed: {"cniVersion": 
    "0.4.0","code": 103,"msg": "Incorrect Network Configuration.","details": "VIPARANGE not defined 
    for ZCONTAINER, errno 121, errnojr 766c7303."}

To resolve the issue, ensure _BPXK_SETIBMOPT_TRANSPORT is configured in /etc/containers/containers.conf. For more information, see Configuring the container runtime using a z/OSMF workflow.

For more information on z/OS UNIX Common INET (CINET) and zOSCP, see Considerations for z/OS UNIX Common INET (CINET).

Errors when starting an application within a container

  • If a container application fails to bind to a given port with a Permission denied message, the port may be reserved somewhere else by the system. If you have an existing PORT statement for the requested port, or a PORTRANGE statement that includes the requested port, you need to add a PORT statement to allow applications within a container to be permitted to bind to that port. This can be accomplished by using the reserved jobname BCZ-CNTR on the PORT statement.

    For more information, see Network Support for IBM z/OS Container Platform.

  • If common INET is configured in your BPXPRMxx PARMLIB member, also verify that the INADDRANYPORT/INADDRANYCOUNT range does not include the requested port.

Avoid unmounting file systems that are used by zOSCP

The first time Podman is run it starts a podman pause process in a new mount namespace. Podman runs in the private mount namespace that the podman pause process is associated, and is isolated from mounts in the global mount namespace. Any changes that are made to the global mount namespace, for example mounting, are not visible to Podman container processes and the podman pause process.

However, unmounts are propagated to the podman pause mount namespace. Unmounting file systems used by zOSCP (which include user home directories) can cause problems for zOSCP programs. In many cases (such as increasing the size of a ZFS), the unmount can be avoided by using bpxwmigf.

For example, if you want to increase the size of a home directory, you should use bpxwmigf to avoid the unmount. When bpxwmigf is used, the mount update is reflected in all mount namespace, and no Podman actions are needed. When an unmount/mount sequence is used, it must be proceeded by the Podman command to remove podman pause, before the unmount is done and to restore podman pause after the mount is done. You can run podman system migrate to stop both the running containers and the podman pause process, which allows Podman to run in a new mount namespace to reflect the changes.

After running podman system migrate, you should verify that there is no active Podman process. If there is, you can run podman system migrate again to stop it.
$ ps -ef | grep podman
ROOTLESS        913          1  - 22:48:06 ?         0:00 podman
If you run Podman, after unmounting and mounting the home directory, you might see one of the following error outputs:
  • $ podman images
    Error: overriding network config directory: creating CNI config file : "open /u/rootless/.config/cni/net.d/10-zoscni.conflist: EDC5129I No such file or directory."
  • $ podman system migrate
    Error: creating runtime static files directory: mkdir /u/k8sauto/.local: EDC5134I Function not implemented.
To resolve this error, you need to manually kill both your Podman containers processes and podman pause process. This then allows Podman to run in a new mount namespace.
$ ps -ef | grep -e podman -e conmon
ROOTLESS   16777363          1  - 02:08:39 ?         0:00 podman
ROOTLESS   16777365          1  - 02:08:39 ?         0:00 /usr/lpp/IBM/zoscp/bin/conmon --api-version 1 -c 71adcfa3f78a1a24aa861b0dc424d6575e8
$ kill 16777363 16777365

kubeadmz join failure

  • When a kubeadmz join command fails for a worker node, the command ends with this message:
    BCZ2219E kubeadmz failed to join the z/OS worker node to the Kubernetes cluster
    Check the ~/.kube/kubeadmz.log to determine the reason for failure.
    • If you see the following message in the log:
      Calling 'wnadm' directly is not supported, will not continue.
      You need to check that your session has the export _BPXK_AUTOCVT=ON environment variable set.