milvus-sdk-java icon indicating copy to clipboard operation
milvus-sdk-java copied to clipboard

High CPU Usage During Insertions in milvus-sdk-java

Open LeePui opened this issue 1 year ago • 4 comments

Hi, everyone, thank Hi everyone,

Thank you for providing such a convenient Java SDK; it has been very useful.

While using version 2.4.3 of the milvus-sdk-java, I have encountered some performance issues. Here are some metrics and analysis that I have gathered.

When performing insertions in a single thread, I noticed unusually high CPU usage. After profiling with async-profiler, I pinpointed the most time-consuming operation at this line: AbstractMilvusGrpcClient.java#L1569.

public R<MutationResult> insert(@NonNull InsertParam requestParam) {
        ......
        logDebug(requestParam.toString());
        ......
}

protected void logDebug(String msg, Object... params) {
    if (logLevel.ordinal() <= LogLevel.Debug.ordinal()) {
        logger.debug(msg, params);
    }
}

The attached flame graph can attest to this issue. image

The high CPU usage seems to be caused by premature calls to toString. In practice, when I set the log level to INFO, there is no need for the toString method to be called. I suggest checking the log level before calling toString.

Thank you for considering this improvement.

LeePui avatar Sep 09 '24 08:09 LeePui

Thanks for pointing out this problem. I didn't realize it was a problem before. The InsertParam.toString() is implemented by lombok annotation @ToString, which parses all the vectors to a long text like "[1.1234, 2.2234, ....]". It becomes a bottleneck when the inserted batch is large.

For the requests that could pass large/complicated data, we should manually customize the toString() method. For insertParam, we just want to print out the target collection name, the number of vectors, no need to print out all the vectors. I will make a change for this, it will take effect in the next minor version.

yhmo avatar Sep 10 '24 08:09 yhmo

good catch!

xiaofan-luan avatar Sep 10 '24 18:09 xiaofan-luan

Fixed by this pr: https://github.com/milvus-io/milvus-sdk-java/pull/1064

yhmo avatar Sep 13 '24 02:09 yhmo

the good code is:

if(logger.isDebugEnabled()){
               String msg = segmentIDs.getDataCount() + " segments of " + collectionName + " has been flushed";
                logDebug(msg);
}
if(logger.isDebugEnabled()){
              logDebug(requestParam.toString());
}

构建调试日志信息之前,就调用isDebugEnabled进行控制,这样性能才佳

yin-bp avatar Nov 01 '24 15:11 yin-bp