On-demand group took 10 minutes to scale up after total priceout
On 10/2 at 18:43:00, one of our EC2-based saw a priceout of the last AZ available to us in the region for that instance type, which took out all 9 of our spot instances, leaving us with 0 instances running in the service. That state of 0 running instances lasted about 7 minutes before the spot market briefly returned to normal. The only scale up we saw for the on-demand group was 10 minutes after the total priceout.
Looking in the logs of the spotswap lambda function, I found the logs that were closest to the priceout:
START
2017-10-02T18:23:08.038Z Finding instance ids in spot group: SpotGroup
2017-10-02T18:23:08.342Z Checking for termination tags on 9 instances
2017-10-02T18:23:09.624Z Found 9 instances with SpotTermination tag
2017-10-02T18:23:12.362Z No-op on stack during a CloudFormation update
END
REPORT Duration: 4325.97 ms Billed Duration: 4400 ms Memory Size: 128 MB Max Memory Used: 53 MB
START
2017-10-02T18:25:08.775Z Finding instance ids in spot group: SpotGroup
2017-10-02T18:25:09.199Z No instances listed
2017-10-02T18:25:09.255Z Found 0 instances with SpotTermination tag
2017-10-02T18:25:09.276Z Checking spot group SpotGroup for scaledown
2017-10-02T18:25:09.276Z Checking spot group SpotGroup for scaledown
18:25:09 END
Strangely, the No-op on stack during CloudFormation update appeared when there was no cloudformation update in progress. Then, in the second message, it shows No instances listed, which seems to then no-op the function when it should have taken the lack of instances as an indication that the on-demand group should scale up.
Two questions:
- Why did spotswap misjudge the cloudformation update?
- Should the lack of spot instances have scaled up the on-demand group?
cc/ @arunasank @emilymcafee
Any update on this? I'd like to implement SpotSwap in our environment but wary of this issue.
@henryngo001 I haven't dug deeper into this issue - I think this was a rare condition though, spotswap works on a regular basis very well for us.
@jakepruitt Thanks for the quick update.