chart icon indicating copy to clipboard operation
chart copied to clipboard

On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional

Open keskival opened this issue 3 years ago • 4 comments

Steps to reproduce the problem

  1. Install Mastodon from the Helm chart to a multi-node Kubernetes cluster with an NFS storage class.
  2. If the mastodon-web and mastodon-sidekiq-all-queues end up on different nodes, some of them will hang indefinitely on "ContainerCreating".

They are waiting to mount the persistence volumes system and assets. These can only be mounted on a single node at a time.

Expected behaviour

Everything should work on roughly default settings

Actual behaviour

The pods hang in ContainerCreating state in a difficult to understand way.

Detailed description

The default settings are non-functional on multi-node clusters. Either there needs to be a better comment warning to set pod affinities, the default mode should be ReadWriteMany, or there should be a pod affinity defined which puts these two kinds of pods to the same nodes by default.

Specifications

Mastodon: edge OS: Ubuntu Kubernetes: MicroK8S Nodes: 2+

keskival avatar Dec 06 '22 13:12 keskival

This same problem also spans to the Job mastodon-db-migrate, for which there doesn't seem to be a separate place to set nodeAffinity by values.yaml.

However, there the Helm chart includes function to set podAffinity to make it co-located with app.kubernetes.io/part-of=rails: https://github.com/mastodon/mastodon/blob/ed07f10ca8d4e65ec58958f300a8bb7c762ccbbd/chart/templates/job-db-migrate.yaml#L22-L35

Similar setting should be added to sidekiq and mastodon-web deployments as well to make them co-locate with each other if ReadWriteOnce is set.

keskival avatar Dec 07 '22 23:12 keskival

Added an in-progress PR here: https://github.com/mastodon/chart/pull/13

keskival avatar Dec 10 '22 16:12 keskival

Hi, have you tried setting the persistence as ReadWriteMany? I ask because I'm setting up a single-node cluster for now but will shift to multi-node in a second moment and I'd like to avoid running into this pitfall. And I don't know if setting ReadWriteMany can work to have multiple pods with Sidekiq and Rails instances possibly not staying on the same pods like it happened to you.

WilyWildWilly avatar Dec 22 '23 07:12 WilyWildWilly

ReadWriteMany works, but of course requires support for it from the storage class. Alternatively you can force the pods to co-locate, which kind of moots the point of having a multi-node cluster in the first place.

keskival avatar Jan 16 '24 13:01 keskival