user-data MIME ordering makes registration via ECS_CLUSTER in ecs.config fragile
Summary
Instances are not registered with ECS because the mandatory user-data "ECS_CLUSTER=xxxx" is placed at the end of the MIME multipart. This will fail to write into /etc/ecs/ecs.config when extra user-data reboots the machine, etc.
Setup
Ubuntu 18.04 ecs-cli version 1.16.0 (5c02c34)
Repo
Create a file ud.sh with content
#!/bin/bash
echo "Do things"
reboot now
Run the following commands with appropriate replaced XXXXXX for keypair and instance-role.
ecs-cli configure --config-name myconfig --cluster mycluster --default-launch-type EC2 --region us-east-1
ecs-cli up --port 22 --cluster-config myconfig --keypair XXXXXX --instance-role XXXXXXX --size 1 --instance-type t3.nano --extra-user-data ./ud.sh
Result
No container is registered in ECS in the mycluster cluster.
Expected
Container registered in mycluster cluster
Notes
ssh to the container instance (you can get the public IP from the EC2 admin tool). No /etc/ecs/ecs.config file exists on the instanance. Because nothing wrote to it.
In the cloudformation stack that is made by the ecs-cli, the multipart can be seen as a param. Notice the ordering. The cluster name is at the end, which is likely never reached because the reboot occurs before and (it seems) cloud-init doesn't continue to the next multipart after the reboot
Content-Type: multipart/mixed; boundary="f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71" MIME-Version: 1.0 --f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71 Content-Type: text/text/plain; charset="utf-8" Mime-Version: 1.0 #!/bin/bash echo "Do things" reboot now --f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71 Content-Type: text/text/x-shellscript; charset="utf-8" Mime-Version: 1.0 #!/bin/bash echo ECS_CLUSTER=mycluster >> /etc/ecs/ecs.config --f73a1fb433fa342d6259a574ab6836391382c4593c16359b6c02c14eee71--
I recommend the ecs-cli code be reordered so that this mandatory userdata is the first part that is run. Here is the general location in code https://github.com/aws/amazon-ecs-cli/blob/78c4de9d3fb4ebbe5bea6d8a78cbdf5269f309ee/ecs-cli/modules/cli/cluster/userdata/user_data.go#L152
Workarounds
- Give birth to the perfect userdata with no possibility of failures, reboots, etc. ;-)
- If you want to reboot in user-data, then use a technique similar to
setsid bash -c "cloud-init status --wait; shutdown --reboot now" & - Find another way to postpone reboots until after this last mandatory multipart completes. You must be successful in managing this race condition.
- Using extra user-data, duplicate the ecs-cli code that writes ECS_CLUSTER and ECS_CONTAINER_INSTANCE_TAGS. You will likely have to hardcode both.
Thanks for reaching out @diablodale, let me poke around with this :)