Merge all services under one server
What
The first attempt here: https://github.com/pythonitalia/pycon/commit/b289a84571d4867e8c68af2af9d9088e36de2c4f, but we couldn't get the NAT instance to work again. (It worked once) and we had to revert it before the morning to avoid ticketing not working today.
This should 4$ of compute and more on storage costs. We should be able to save more once we migrate Pretix to ARM.
Services
- [ ] PyCon workers (celery and beat)
- [ ] Pretix
- [ ] NAT instance
- [ ] Redis
Some notes:
It is quite difficult to get NAT instance + Docker working. Docker applies some iptables rules that make the NAT instance stop working.
If we can't merge the nat instance with the rest of the services, it might be harder to justify the cost of switching to t3a.medium.
We could still see some savings by using a custom ECS image with a lower block storage requirement (we don't need 30GB of storage for all those servers): https://github.com/aws/amazon-ecs-ami
Pretix is quite memory-heavy. When deployed on a t3.small with redis, backend beat, and backend worker, it seems like it fails to start the web process and gets killed signal 9. I need to investigate a bit better, memory usage is high, but even after adding some swap space, it doesn't help.
Current setup:
nat instance t4g.nano 8GB
production-pretix-instance t3.small 30gb EBS / 20GB EBS
pythonit-production-worker t4g.micro 30gb EBS
pythonit-production-redis t4g.nano 30gb EBS / 10GB EBS
All GP3.
Things to look to continue into this
- [ ] Understand how much savings we will get by lowering the storage from 30 GB to 8 GB. Last month cost was ~10$ of "EC2 other"
- [ ] Can we merge the Redis server with the worker server at least?
- [ ] How much would it cost us to have
t3a.medium+t4g.nano? - [ ] How hard is it to move Pretix to ARM? How much could we save with a single ARM server? Would performance be better?
- [ ] We deploy Pretix in a single container that internally starts the multiple processes (web server, celery, etc). Would we get more control over performance if we split them in multiple containers? Could we move the web server to lambda?
- [ ] Can we move the worker to
t4g.nano?
We also need to keep an eye on credits:
Amazon Elastic Compute Cloud T3ACPUCredits
USD 0.02
$0.05 per vCPU-Hour of T3A CPU Credits
0.473 vCPU-Hours USD 0.02
Amazon Elastic Compute Cloud T4GCPUCredits
USD 0.00
$0.04 per vCPU-Hour of T4G CPU Credits
0.037 vCPU-Hours USD 0.00
Done in:
- https://github.com/pythonitalia/pycon/pull/4164
- https://github.com/pythonitalia/pycon/pull/4198
- https://github.com/pythonitalia/pycon/pull/4190