user-data MIME ordering makes registration via ECS_CLUSTER in ecs.config fragile

Open diablodale opened this issue 6 years ago • 1 comments

Summary

Instances are not registered with ECS because the mandatory user-data "ECS_CLUSTER=xxxx" is placed at the end of the MIME multipart. This will fail to write into /etc/ecs/ecs.config when extra user-data reboots the machine, etc.

Setup

Ubuntu 18.04 ecs-cli version 1.16.0 (5c02c34)

Repo

Create a file ud.sh with content

#!/bin/bash
echo "Do things"
reboot now

Run the following commands with appropriate replaced XXXXXX for keypair and instance-role.

ecs-cli configure --config-name myconfig --cluster mycluster --default-launch-type EC2 --region us-east-1
ecs-cli up --port 22 --cluster-config myconfig --keypair XXXXXX --instance-role XXXXXXX --size 1 --instance-type t3.nano --extra-user-data ./ud.sh

Result

No container is registered in ECS in the mycluster cluster.

Expected

Container registered in mycluster cluster

Notes

ssh to the container instance (you can get the public IP from the EC2 admin tool). No /etc/ecs/ecs.config file exists on the instanance. Because nothing wrote to it.

In the cloudformation stack that is made by the ecs-cli, the multipart can be seen as a param. Notice the ordering. The cluster name is at the end, which is likely never reached because the reboot occurs before and (it seems) cloud-init doesn't continue to the next multipart after the reboot

Content-Type: multipart/mixed; boundary="f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71" MIME-Version: 1.0 --f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71 Content-Type: text/text/plain; charset="utf-8" Mime-Version: 1.0 #!/bin/bash echo "Do things" reboot now --f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71 Content-Type: text/text/x-shellscript; charset="utf-8" Mime-Version: 1.0 #!/bin/bash echo ECS_CLUSTER=mycluster >> /etc/ecs/ecs.config --f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71--

I recommend the ecs-cli code be reordered so that this mandatory userdata is the first part that is run. Here is the general location in code https://github.com/aws/amazon-ecs-cli/blob/78c4de9d3fb4ebbe5bea6d8a78cbdf5269f309ee/ecs-cli/modules/cli/cluster/userdata/user_data.go#L152

Workarounds

Give birth to the perfect userdata with no possibility of failures, reboots, etc. ;-)
If you want to reboot in user-data, then use a technique similar to setsid bash -c "cloud-init status --wait; shutdown --reboot now" &
Find another way to postpone reboots until after this last mandatory multipart completes. You must be successful in managing this race condition.
Using extra user-data, duplicate the ecs-cli code that writes ECS_CLUSTER and ECS_CONTAINER_INSTANCE_TAGS. You will likely have to hardcode both.

Sep 05 '19 01:09 diablodale

Thanks for reaching out @diablodale, let me poke around with this :)

Sep 05 '19 18:09 kohidave