bazels3cache Dockerize app for k8s

I was trying to build the app and deploy to k8s, but running into CrashLoopBackOff issue.

Dec 16 '20 19:12 chenrui333

➤ Mike Morearty commented:

Someone outside of Asana created this issue in the (publicly visible, open source) bazels3cache repo. I don't really understand what they are asking for. Anyone have any clues?

Dec 16 '20 19:12 asana-unito

Thanks @mmorearty!

Dec 16 '20 19:12 chenrui333

Hi there @chenrui333. Can you briefly describe the step you talk to run this in your k8s app/cluster and post here the logs from the crash looping container (kubectl logs POD_NAME -c CONTAINER_NAME --previous)?

Dec 16 '20 19:12 spiliopoulos

Also just to manage expectations. This is an open source version of a tool we use extensively at Asana but since our usecase is not necessarily the same as your we might not have an immediate answer and our bandwidth for support is extremely limited. That said if you are willing to provide us with as much debugging information as you can to understand your issue we will try to help.

Dec 16 '20 19:12 spiliopoulos

Thanks for the quick reply.

here is my Dockerfile:

FROM node:14

# where the bazel cache goes
ARG BUCKET_NAME="bazel-cache"
ENV CACHE_BUCKET=$BUCKET_NAME

RUN npm install -g bazels3cache

EXPOSE 7777

CMD ["bazels3cache", "--bucket=$CACHE_BUCKET"]

Here is what I got from the container

$ kubectl -n bazel-cache logs pod/bazel-cache-74c8b7876c-wx9mj
bazels3cache: started server at http://localhost:7777/, logging to /root/.bazels3cache.log

Dec 16 '20 20:12 chenrui333

Also just to manage expectations. This is an open source version of a tool we use extensively at Asana but since our usecase is not necessarily the same as your we might not have an immediate answer and our bandwidth for support is extremely limited. That said if you are willing to provide us with as much debugging information as you can to understand your issue we will try to help.

totally understand, I think this issue would help more wide adoption and improvements on this utility.

Dec 16 '20 20:12 chenrui333

I think the logs from the pod are from the current run which does not seem like it has crashed yet.

Can you try doing

$ kubectl -n bazel-cache get pods

We would expect to see a Restart count higher than 0 If that's true then try running

$ kubectl -n bazel-cache logs POD_NAME -c CONTAINER_NAME --previous

To get the logs from the previous run (which we now know crashed)

Finally also include the output of

$ kubectl -n bazel-cache describe pod POD_NAME

Dec 16 '20 20:12 spiliopoulos

Here are all the requested output :)

$ kubectl -n bazel-cache get pods
NAME                           READY   STATUS             RESTARTS   AGE
bazel-cache-66ff494f8f-4ghsv   0/1     CrashLoopBackOff   32         140m

$ kubectl -n bazel-cache logs bazel-cache-66ff494f8f-4ghsv -c bazel-cache --previous
bazels3cache: started server at http://localhost:7777/, logging to /root/.bazels3cache.log

$ kubectl -n bazel-cache describe pod bazel-cache-66ff494f8f-4ghsv
Name:           bazel-cache-66ff494f8f-4ghsv
Namespace:      bazel-cache
Priority:       0
Node:           ip-x.x.x.x.ec2.internal/x.x.x.x
Start Time:     Wed, 16 Dec 2020 13:07:23 -0500
Labels:         name=bazel-cache
                pod-template-hash=66ff494f8f
Annotations:    kubernetes.io/psp: eks.privileged
Status:         Running
IP:             x.x.x.x
IPs:            <none>
Controlled By:  ReplicaSet/bazel-cache-66ff494f8f
Containers:
  bazel-cache:
    Container ID:   docker://406fcdd14a3e966369b2cf53ed3d2ded6959e230c477bf6aa66b36b105ba034f
    Image:          xxx.dkr.ecr.us-east-1.amazonaws.com/bazel-cache:5d8948b68092b91663642053e7d29a15cf699b5a
    Image ID:       docker-pullable://xxx.dkr.ecr.us-east-1.amazonaws.com/bazel-cache@sha256:d375e1dbee921a663f856c1c58566a1e6af5c59df8a0f445a7d3409152e078b2
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 16 Dec 2020 15:27:13 -0500
      Finished:     Wed, 16 Dec 2020 15:27:14 -0500
    Ready:          False
    Restart Count:  32
    Environment:
      AWS_ACCESS_KEY_ID:      <set to the key 'aws_access_key_id' in secret 'bazel-cache'>      Optional: false
      AWS_SECRET_ACCESS_KEY:  <set to the key 'aws_secret_access_key' in secret 'bazel-cache'>  Optional: false
      AWS_DEFAULT_REGION:     us-east-1
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qc787 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-qc787:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qc787
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                   From                                   Message
  ----     ------   ----                  ----                                   -------
  Warning  BackOff  92s (x640 over 141m)  kubelet, ip-10-181-2-174.ec2.internal  Back-off restarting failed container

Dec 16 '20 20:12 chenrui333

@spiliopoulos any idea?

Dec 16 '20 21:12 chenrui333

I would guess this is because bazels3cache runs as a daemon - i.e. the command you run completes quickly, leaving another process running in the background.

Possible fixes:

(Better, more work) Add a flag to the implementation to run in the foreground
(Hackly, less work) Replace your invocation with something like bash -c "bazels3cache && sleep infinity" so the pod keeps running even after it has started.

Dec 16 '20 21:12 theospears

yeah, the second sounds about right, let me try that out. Thanks @theospears!

Dec 16 '20 21:12 chenrui333

Just for clarity the second solution as theo suggested is just a hack. There is the danger that the daemonized process will fail but because you have the sleep infinity clause in there your pod would never terminate.

I think you should go with Theo's #1 suggestion. Good news for you there is a very simple way I believe to implement this yourself. You can try removing https://github.com/Asana/bazels3cache/blob/1977e699b2c9c0e30e04b8b1eb8fdadff4e8853a/src/index.ts#L71-L94 leaving only the else clause which should launch the cache in a non-daemon mode.

Dec 16 '20 22:12 spiliopoulos

Sounds great! Thanks @spiliopoulos!

Dec 16 '20 23:12 chenrui333

I tried removing that part of code and publish as a separate package, but unfortunately, it still does not work for the docker run case. (This time, it does not even have server start output).

Here is the commit ref

Dec 17 '20 02:12 chenrui333

I actually tried both ideas, both do not work for me.

Dec 17 '20 02:12 chenrui333

yeah, here is the ps print out for the suggestion #2:

root@e301c00636bc:/# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 18:45 ?        00:00:00 sleep infinity
root        14     1  0 18:45 ?        00:00:00 /usr/local/bin/node /usr/local/lib/node_modules/bazelcache/dist/index.js --daemon -
root        25     0  0 18:47 pts/0    00:00:00 bash
root        32    25  0 18:47 pts/0    00:00:00 ps -ef

Dec 17 '20 18:12 chenrui333