NEW! Mirantis Academy -   Learn confidently with expert guidance and On-demand content.   Learn More

< BLOG HOME

Mirantis patches containerd to address race condition

containerd

Mirantis recently released Mirantis Container Runtime (MCR) 23.0.10, which included the new upstream containerd v1.6.30. Shortly after our MCR patch release, as our testing and internal usage continued, we discovered that Mirantis customers on Linux risked being affected by an upstream issue in this new version of containerd (Windows users are unaffected). To remedy the situation, Mirantis has produced and made available version 1.6.30~rc.2 of containerd. This new build closely resembles the upstream version, but removes the offending change; we selected this approach to provide Mirantis customers with maximum stability and features while concurrently removing risk.

Because this issue impacts only containerd, there is no need to deploy a new version of MCR to benefit from this fix. Therefore, all future new installations/upgrades of MCR 23.0.10 (or other) that consume the fixed containerd are unaffected, and do not need any corrective action to be taken.

Symptoms of the issue

When using the upstream version of containerd 1.6.30, there is a race condition which can make arbitrary docker exec commands become unresponsive.  The probability of the race condition manifesting increases when there are more concurrent execs into a single container, which can result from docker exec commands or container health checks. Larger clusters performing a greater number of operations are especially at risk.

While a hanging docker exec could manifest in a variety of ways, a simple way to determine if there are affected processes active on a given node is to use the ps command:

ubuntu@host:~ $ ps aux | grep "docker exec"
ubuntu    2815926    0.0   0.1   1623668 25080 pts/0   Sl   05:02   0:00 docker exec nginx-1 true
ubuntu    2815961    0.0   0.1   1697592 24836 pts/0   Sl   05:02   0:00 docker exec nginx-1 true
ubuntu    2816106    0.0   0.1   1623860 25012 pts/0   Sl   05:02   0:00 docker exec nginx-1 true
ubuntu    2816255    0.0   0.1   1623604 24760 pts/0   Sl   05:02   0:00 docker exec nginx-1 true
ubuntu    2816363    0.0   0.1   1697656 24932 pts/0   Sl   05:02   0:00 docker exec nginx-1 true
ubuntu    2816912    0.0   0.1   1697400 25000 pts/0   Sl   05:02   0:00 docker exec nginx-1 true
ubuntu    2817906    0.0   0.1   1697336 24676 pts/0   Sl   05:02   0:00 docker exec nginx-1 true
ubuntu    2817908    0.0   0.1   1623860 24348 pts/0   Sl   05:02   0:00 docker exec nginx-1 true

You are likely experiencing this issue if the output displays either of the following conditions:

  • Any number of unexpecteddocker exec commands in sleep (Sl) state that do not change over time

  • A set of docker exec commands with the same or older start time

As previously mentioned, depending on the use case, the symptoms may appear in a variety of ways. Generally speaking, if operations suddenly and unexpectedly begin to report timeouts after changing your version of containerd, then this issue may be the root cause.

Determination of susceptibility

If you are using Mirantis Launchpad and/or the MCR install.sh script and have NOT updated to MCR 23.0.10 (or have done so on or after March 28, 2024), then the probability that you are impacted is low.  However, rather than risk experiencing the symptoms of this issue, proactively verifying that you are not running an affected version of containerd is a straightforward task.

The version of containerd that shipped with MCR 23.0.10 contained the affected code, and an installation of MCR 23.0.10 performed prior to March 28, 2024 is likely to have installed this version of containerd.  If you have used a customized installation method to install MCR, it is also possible for a previous version of MCR to have used the unpatched containerd.  

To verify whether your environment is affected, check each node for the version of containerd in use:

ubuntu@host:~$ docker version
Client: Mirantis Container Runtime
 Version:           23.0.10-rc1
 API version:       1.42
 Go version:        go1.21.8m1 X:boringcrypto
 Git commit:        8d04317
 Built:             Wed Mar 13 21:51:54 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Mirantis Container Runtime (Unlicensed - not for production workloads)
 Engine:
  Version:          23.0.10-rc1
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.21.8m1 X:boringcrypto
  Git commit:       2eb2075
  Built:            Wed Mar 13 21:51:54 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.30-rc.1
  GitCommit:        934d1942d1fe36f99a4f7e65bf80db09754f0c76
 runc:
  Version:          1.1.12-m1
  GitCommit:        8ac7905
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

If the reported version of containerd is 1.6.30-rc.1, then the node could be impacted by this race condition in containerd, depending on operating circumstances.

Remediation of the issue

If you determine that your environment is running the affected version of containerd, then the solution is simply to upgrade your containerd package and restart the process. A restart of containerd and the docker service is required regardless of the mechanism used to apply the fix.

Using LaunchPad

Note: If you are using install.sh for airgapped installations or otherwise caching the script, be sure to use the latest version before updating or installing new instances of MCR to ensure success.

launchpad apply --force-upgrade

With this command, the --force-upgrade flag is required to ensure that MCR 23.0.10 is reapplied with the new containerd package, despite this MCR version already being installed on the target system.

Using Red Hat Package Manager (RHEL, Oracle Linux, Rocky Linux)

sudo yum install -y containerd.io

Using Debian Package Manager (Ubuntu)

sudo apt-get update

sudo apt-get install -y containerd.io=1.6.30~rc.2-1

Using SUSE Package Manager

sudo zypper refresh

sudo zypper install -y containerd.io-1.6.30-2.2.rc.2.1

Restart components (containerd & engine)

sudo systemctl restart docker containerd

Upon successful update, docker version will report containerd version 1.6.30-rc.2:

$ docker version
Client: Mirantis Container Runtime
 Version:           23.0.10
 API version:       1.42
 Go version:        go1.21.8m1 X:boringcrypto
 Git commit:        8d04317
 Built:             Wed Mar 20 17:59:33 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Mirantis Container Runtime (Unlicensed - not for production workloads)
 Engine:
  Version:          23.0.10
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.21.8m1 X:boringcrypto
  Git commit:       2eb2075
  Built:            Wed Mar 20 17:55:41 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.30-rc.2
  GitCommit:        502191142248816d148ad6b5f4455afac05e8092
 runc:
  Version:          1.1.12-m1
  GitCommit:        8ac7905
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Impact on Mirantis Kubernetes Engine (MKE) users

The use of MCR with MKE does not fundamentally affect the implications of this problem. MKE users can follow the same steps to determine if they are impacted and apply remediation (if necessary) as other consumers of MCR; no additional steps are required.

Learn more about Mirantis Container Runtime.

Mirantis simplifies cloud native development.

From the leading container engine for Windows and Linux to fully managed services and training, we can help you at every step of your cloud native journey.

Connect with a Mirantis expert to learn how we can help you.

Contact Us
NEWSLETTER

Subscribe to our bi-weekly newsletter for exclusive interviews, expert commentary, and thought leadership on topics shaping the cloud native world.

JOIN NOW