demikernel icon indicating copy to clipboard operation
demikernel copied to clipboard

[catnip] Catnip runs out of memory at higher loads

Open nimishwadekar opened this issue 2 years ago • 1 comments

Description

Catnip panics with failed to clone mbuf every time at higher loads with multiple connections. The reason is that the DPDK mempool runs out of free memory to clone mbufs from. This has been verified using rte_mempool_avail_count(). It was tested using a custom example static page HTTP server and the Caladan client generator at 150K requests per second over 100 connections (this number is specific to my testing, but there is always a threshold value above which it runs out of memory). This issue was not occurring on commit 803c363a765ae14f3eff2a9de021eabf23f8d10c dated 30th September 2022.

The pattern stays similar every time:

The server works fine (the free memory in the mempool stays around the same) for a short while. The free memory then starts decreasing. The decreases are always in big jumps and then they start rising slowly for a while (presumably due to free()s) before a big decrease again (another big allocation), with a net negative differential. This continues until the mempool is out of memory and the server panics.

Note: This issue does not happen for a lesser number of connections. A possible reason could be the O(N) time complexity of wait_any(), which leads to a higher time interval between calls to poll() as the number of connections, and consequently the number of QTokens passed to wait_any(), increases, leading to a backlog of arrivals at the NIC that are polled all at once, which requires a large allocation of DPDK memory. This still does not explain why it did not occur in the aforementioned commit, so something elsewhere was probably modified that causes this memory leak.

nimishwadekar avatar Aug 06 '23 08:08 nimishwadekar

Fixing issue #1330 should alleviate the memory problem. We still need to verify if there are any leaks around adding/removing to and from the queue.

anandbonde avatar Oct 02 '24 18:10 anandbonde