Memory usage is too high and not released in Java (embedded) mode.
To reproduce
-
Start the application I built with Spring Boot 3, with JVM parameters set to -Xmx2G -Xms2G.
-
I will create globally
CairoConfiguration,CairoEngine, andCairoEngineduring the project startup. -
There is a table with 20 million records, and I will perform several group aggregation statistics on this table based on different conditions.
-
It will be observed that the JVM garbage collection is normal, but the RES usage shown by the top command is very high and does not get released even after garbage collection.
-
I tried using NMT to check the memory situation, but did not find any issues. native_memory.txt
-
I suspect that there might be some places in QuestDB where memory is not being released. Please help me.
-
Because it is deployed alongside other programs, the memory usage cannot be so high, even if it means sacrificing some performance.
QuestDB version:
8.1.0
OS, in case of Docker specify Docker and the Host OS:
CentOS 8.2.2004
File System, in case of Docker specify Host File System:
ext4
Full Name:
xtadg
Affiliation:
Sugon
Have you followed Linux, MacOs kernel configuration steps to increase Maximum open files and Maximum virtual memory areas limit?
- [X] Yes, I have
Additional context
No response
I use pmap -x pid | sort -n -k3 , and found many files are occupying out-of-heap memory, causing high RES usage. Why don’t these files release after use? Is there a configuration that allows QuestDB to release it in a timely manner?"
What is your schema, and what queries are you trying to run?
What is your schema, and what queries are you trying to run?
@nwoolmer Thank you for your reply,here is my schema
CREATE TABLE t_traffic_log (
id long,
src_zone symbol,
dst_zone symbol,
src_region symbol,
dst_region symbol,
src_ip long256,
dst_ip long256,
src_port int,
dst_port int,
proto symbol,
app_name symbol,
user_name symbol,
sec_policy symbol,
bandwidth_policy symbol,
uplink_bytes long,
uplink_pkts long,
downlink_bytes long,
downlink_pkts long,
total_bytes long,
total_pkts long,
interface_in symbol,
interface_out symbol,
session_closed_reason symbol,
vsys_name symbol index,
timestamp timestamp
) TIMESTAMP(timestamp) PARTITION BY DAY BYPASS WAL
and here is my queries
# The variable x switches back and forth among the fields: src_zone, dst_zone, src_region, dst_region, src_ip, dst_ip, src_port, dst_port, proto, app_name, user_name, sec_policy, and bandwidth_policy, executing several rounds.
SELECT x,
uplinkBytes / (select SUM(uplink_bytes) from t_traffic_log),
downlinkBytes / (select SUM(downlink_bytes) from t_traffic_log),
totalBytes / (select SUM(total_bytes) from t_traffic_log),
sessionNum / (select COUNT() from t_traffic_log)
FROM (SELECT x,
SUM(uplink_bytes) AS uplinkBytes,
SUM(downlink_bytes) AS downlinkBytes,
SUM(total_bytes) AS totalBytes,
COUNT() AS sessionNum
FROM t_traffic_log
GROUP BY src_addr)
@nwoolmer Can you help me? As long as the memory is limited, I can even accept a decrease in query performance.
@xtadg you need to call CairoEngine#releaseInactive() periodically to release inactive table readers and writers. Otherwise, they might keep mmapped and anonymous memory around. It's a good idea to take a look at how we use APIs in ServerMain which is the class responsible for running the database in standalone mode.
Also, if you need to check native memory usage, you can call dump_memory_usage(); query which will print grouped memory usage stats to the log. Alternatively, you can access this data programatically:
for (int i = MemoryTag.MMAP_DEFAULT; i < MemoryTag.SIZE; i++) {
System.out.println("tag: " + MemoryTag.nameOf(i) + ", used bytes:" + Unsafe.getMemUsedByTag(i));
}
@puzpuzpuz Thanks. I've tried adding CairoEngine#releaseInactive() at the end of my query function, but it still doesn't work. I used the code you provided to print out the RSS memory usage,
but the result is quite different from what I saw with the top command.
The following is the core code for the query I executed.
I really don't know where the problem lies.
I've tried adding
CairoEngine#releaseInactive()at the end of my query function, but it still doesn't work.
Table readers are only released after a timeout defined by CairoConfiguration#getInactiveReaderTTL(), so calling releaseInactive() immediately may not close readers (at least with PropServerConfiguration). You can try calling releaseAllReaders() instead of releaseInactive() to see if this changes anything (keep in mind that releaseAllReaders() is a test-only API, so it shouldn't be called in prod code).
As for top's RSS, as as as I recall it includes certain part of mmapped memory in that value (in-use, active pages), so you should also be checking MMAP_* memory stats.
@puzpuzpuz Thank you very much. It's just as I guessed. I think that the readers would not be released immediately, so I added a scheduled task to call releaseInactive. After observation, the RSS occupation can be released successfully. Finally, this problem has been solved. Thanks.