Fixing a kubelet memory leak in Kubernetes 1.36

(heyoncall.com)

60 points | by compumike 18 hours ago

6 comments

  • compumike 18 hours ago
    Author here! If you're running a Kubernetes cluster, I recommend you check `kubectl version` and see if you're running "Server Version: v1.36.[0,1,2]". If so, you may want to use the one-liner at the end of the article to check your "process_resident_memory_bytes" on each node, and consider restarting kubelet as a temporary workaround to tame the memory leak until v1.36.3 is released.
    • CamouflagedKiwi 3 hours ago
      Nice find.

      Can't help but feel this is one of the subtle traps hidden beneath the advice that contexts aren't supposed to be stored. I know it's not always that easy, of course.

      • compumike 3 hours ago
        Thanks. I know there's a `go vet` tool that's run as part of Kubernetes CI, and one of its checks is:

          lostcancel: check cancel func returned by context.WithCancel is called
        
        I'm not 100% sure why `go vet` didn't catch this issue, but storing the cancelFn in the struct is probably part of the reason. Any Go experts know if that's the case?
        • cyberax 2 hours ago
          The cancel function escapes the function body, so static analysis can't detect it. There's another lint for that (containedctx), but I think it's off in K8s.

          This is a serious tripping point with Go. There's no way to express: "this is a root context that I _want_ to store and only use to create derived contexts". Goroutines are also a source of problems, you can't easily say "I'm passing the ownership of this context to a goroutine".

          • compumike 1 hour ago
            It does seem like a serious tripping point.

            I took a quick look at "containedctx" and it seems like for this case, it would almost be backwards: it would flag the (not-memory-leaking) struct-stored "status.ctx", but wouldn't flag when there is a stored "status.cancelFn" only (which resulted in the memory leak).

            Could Go implement a runtime leaked-context detector, like the data race detector? https://go.dev/doc/articles/race_detector

            • cyberax 1 hour ago
              This actually is possible now. Contexts are now garbage-collectible, even if cancel() is not called.

              In this case, the cancel() function was preventing the collection. But I think it can be changed to hold a weak reference instead. The overhead is too large to run it normally, but it should be OK for something like the race detector.

            • keynha 2 hours ago
              [flagged]
        • rirze 4 hours ago
          Very cool. It's often daunting to contribute to such a well-established and recognizable project, but this is exactly how it should work.
          • jeffbee 34 minutes ago
            Given that this happens on the main loop, one wonders how this escaped release quality checks. Is there another contributing cause that narrows the impact?
            • fsuts 3 hours ago
              Not all heroes wear capes! Well done
              • __turbobrew__ 3 hours ago
                A good reason to health check the kubelet process and restart it when the checks fail.
                • compumike 2 hours ago
                  What kind of health checks? In my case, the kubelet process was staying alive and responsive to queries, I believe due to:

                    # cat /proc/$(pgrep kubelet)/oom_score_adj
                    -999
                    
                    (from OOMScoreAdjust=-999 in /etc/systemd/system/kubelet.service)  
                  
                  With this score, the Linux OOM killer wouldn't touch it, but any of my Pods were fair game.
                  • nijave 1 hour ago
                    At the metrics level, you can compare old vs new release. Have been bitten before by resource requirements dramatically change (regardless of whether it's a bug or functionality change)