Dockerize app for k8s
I was trying to build the app and deploy to k8s, but running into CrashLoopBackOff issue.
➤ Mike Morearty commented:
Someone outside of Asana created this issue in the (publicly visible, open source) bazels3cache repo. I don't really understand what they are asking for. Anyone have any clues?
Thanks @mmorearty!
Hi there @chenrui333. Can you briefly describe the step you talk to run this in your k8s app/cluster and post here the logs from the crash looping container (kubectl logs POD_NAME -c CONTAINER_NAME --previous)?
Also just to manage expectations. This is an open source version of a tool we use extensively at Asana but since our usecase is not necessarily the same as your we might not have an immediate answer and our bandwidth for support is extremely limited. That said if you are willing to provide us with as much debugging information as you can to understand your issue we will try to help.
Thanks for the quick reply.
here is my Dockerfile:
FROM node:14
# where the bazel cache goes
ARG BUCKET_NAME="bazel-cache"
ENV CACHE_BUCKET=$BUCKET_NAME
RUN npm install -g bazels3cache
EXPOSE 7777
CMD ["bazels3cache", "--bucket=$CACHE_BUCKET"]
Here is what I got from the container
$ kubectl -n bazel-cache logs pod/bazel-cache-74c8b7876c-wx9mj
bazels3cache: started server at http://localhost:7777/, logging to /root/.bazels3cache.log
Also just to manage expectations. This is an open source version of a tool we use extensively at Asana but since our usecase is not necessarily the same as your we might not have an immediate answer and our bandwidth for support is extremely limited. That said if you are willing to provide us with as much debugging information as you can to understand your issue we will try to help.
totally understand, I think this issue would help more wide adoption and improvements on this utility.
I think the logs from the pod are from the current run which does not seem like it has crashed yet.
Can you try doing
$ kubectl -n bazel-cache get pods
We would expect to see a Restart count higher than 0 If that's true then try running
$ kubectl -n bazel-cache logs POD_NAME -c CONTAINER_NAME --previous
To get the logs from the previous run (which we now know crashed)
Finally also include the output of
$ kubectl -n bazel-cache describe pod POD_NAME
Here are all the requested output :)
$ kubectl -n bazel-cache get pods
NAME READY STATUS RESTARTS AGE
bazel-cache-66ff494f8f-4ghsv 0/1 CrashLoopBackOff 32 140m
$ kubectl -n bazel-cache logs bazel-cache-66ff494f8f-4ghsv -c bazel-cache --previous
bazels3cache: started server at http://localhost:7777/, logging to /root/.bazels3cache.log
$ kubectl -n bazel-cache describe pod bazel-cache-66ff494f8f-4ghsv
Name: bazel-cache-66ff494f8f-4ghsv
Namespace: bazel-cache
Priority: 0
Node: ip-x.x.x.x.ec2.internal/x.x.x.x
Start Time: Wed, 16 Dec 2020 13:07:23 -0500
Labels: name=bazel-cache
pod-template-hash=66ff494f8f
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: x.x.x.x
IPs: <none>
Controlled By: ReplicaSet/bazel-cache-66ff494f8f
Containers:
bazel-cache:
Container ID: docker://406fcdd14a3e966369b2cf53ed3d2ded6959e230c477bf6aa66b36b105ba034f
Image: xxx.dkr.ecr.us-east-1.amazonaws.com/bazel-cache:5d8948b68092b91663642053e7d29a15cf699b5a
Image ID: docker-pullable://xxx.dkr.ecr.us-east-1.amazonaws.com/bazel-cache@sha256:d375e1dbee921a663f856c1c58566a1e6af5c59df8a0f445a7d3409152e078b2
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 16 Dec 2020 15:27:13 -0500
Finished: Wed, 16 Dec 2020 15:27:14 -0500
Ready: False
Restart Count: 32
Environment:
AWS_ACCESS_KEY_ID: <set to the key 'aws_access_key_id' in secret 'bazel-cache'> Optional: false
AWS_SECRET_ACCESS_KEY: <set to the key 'aws_secret_access_key' in secret 'bazel-cache'> Optional: false
AWS_DEFAULT_REGION: us-east-1
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qc787 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-qc787:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qc787
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 92s (x640 over 141m) kubelet, ip-10-181-2-174.ec2.internal Back-off restarting failed container
@spiliopoulos any idea?
I would guess this is because bazels3cache runs as a daemon - i.e. the command you run completes quickly, leaving another process running in the background.
Possible fixes:
- (Better, more work) Add a flag to the implementation to run in the foreground
- (Hackly, less work) Replace your invocation with something like
bash -c "bazels3cache && sleep infinity"so the pod keeps running even after it has started.
yeah, the second sounds about right, let me try that out. Thanks @theospears!
Just for clarity the second solution as theo suggested is just a hack. There is the danger that the daemonized process will fail but because you have the sleep infinity clause in there your pod would never terminate.
I think you should go with Theo's #1 suggestion. Good news for you there is a very simple way I believe to implement this yourself. You can try removing https://github.com/Asana/bazels3cache/blob/1977e699b2c9c0e30e04b8b1eb8fdadff4e8853a/src/index.ts#L71-L94 leaving only the else clause which should launch the cache in a non-daemon mode.
Sounds great! Thanks @spiliopoulos!
I tried removing that part of code and publish as a separate package, but unfortunately, it still does not work for the docker run case. (This time, it does not even have server start output).
Here is the commit ref
I actually tried both ideas, both do not work for me.
yeah, here is the ps print out for the suggestion #2:
root@e301c00636bc:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 18:45 ? 00:00:00 sleep infinity
root 14 1 0 18:45 ? 00:00:00 /usr/local/bin/node /usr/local/lib/node_modules/bazelcache/dist/index.js --daemon -
root 25 0 0 18:47 pts/0 00:00:00 bash
root 32 25 0 18:47 pts/0 00:00:00 ps -ef