online out of memory problems
We had a problem this weekend with a horde of users of online.interlisp.org. Due to being in a 'top 10' of hackernews I think.
>> [1194586.169244] Out of memory: Killed process 165707 (ldex) total-vm:272472kB, anon-rss:263644kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:572kB oom_score_adj:0
we're running on a medium size instance (see https://github.com/Interlisp/online/blob/master/docs/OIO_Operations_Guide.md for details.
To avoid future problems we should
- [ ] load shed when we're full and take no more. Right now with 256+mb per user and 2 GB of ram that's 7 users max at any one time -- could be much better if we:
- [x] Turn on demand paging (which was invented at BBN for BBN-Lisp on TENEX in a full circle)
- [x] Run Medley with -m 64 since users who need more space should run on their own machine
- [ ] finally, if we're more popular we can use a bigger instance with 8 instead of 2 gig of ram for $0.0832 per hour instead of $0.0416 per hour or an additional ~ $30/month.
Interlisp and Medley versions 1 and 2 were 64mb only. It's very efficient in compiled code size and CDR-coding. The increase to 256mb kept a smaller footprint.
I don't know if this EC2 instance has an ephemeral disk associated with it, but if it does we could add swap space at the OS level. These instances have /proc/sys/vm/overcommit_memory set to 0, so it's "heuristic_overcommit" but I don't have a sense for what it actually ends up doing with our images. I've just tried setting it to 1, but when I then started up a new online session it says the VSZ/RSZ are both about 256 MB, but see below...
Various bits of the code assume that Lisp memory is initialized to 0, and I don't think that memory allocated by posix_memalign() makes that promise so we currently memset() it all to 0, which is going to force the allocation of real memory. If we use mmap(), we can either use MAP_ANON (it's not in the POSIX spec, but I think everyone that we care about supports it) or we can mmap "/dev/zero", with MAP_PRIVATE, which should be the same. A quick test on the mac with mmap MAP_ANON and not explicitly zeroing the memory shows a resident set size of 13 MB, which is consistent with 4% of the Lisp memory being in use -- instead of all 256MB.
I have updated online.interlisp.org in two ways:
-
Medley now runs with -m 64. This is currently "hardwired". Eventually, I will add an advanced option that the user can use to set the -m value on the Run Medley page. The default will remain -m 64.
-
I added an 8GB swap volume to the instance. This is an EBS volume since our instance has no ephemeral storage (nor does any reasonable instance type for our needs). This should help manage heavy loads - probably at the expense of some performance.
I will add on to my task list to figure out some easy way to do load shedding as well.
out of memory problems have been fixed. Closing issue.