VectorDBBench icon indicating copy to clipboard operation
VectorDBBench copied to clipboard

Timeout issue with Milvus

Open mattbloke99 opened this issue 1 year ago • 7 comments

Getting consistent timeout issues when testing Milvus with standard Cohere 1M vectors, 768 dimensions test On c5a.8xlarge EC2 instance 32vCPU and 64Gb RAM and 128Gb SSD

2024-06-05 10:59:09,978 | WARNING: VectorDB optimize timeout in 900 (task_runner.py:204) (197576) 2024-06-05 10:59:10,041 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:146) (197576) Traceback (most recent call last): File "/home/ubuntu/repos/vector-dbs/env/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 202, in _optimize return future.result(timeout=self.ca.optimize_timeout)[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/concurrent/futures/_base.py", line 458, in result raise TimeoutError() TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ubuntu/repos/vector-dbs/env/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 134, in _run_perf_case build_dur = self._optimize() ^^^^^^^^^^^^^^^^ File "/home/ubuntu/repos/vector-dbs/env/lib/python3.12/site-packages/vectordb_bench/backend/task_runner.py", line 207, in _optimize raise PerformanceTimeoutError("Performance case optimize timeout") from e vectordb_bench.models.PerformanceTimeoutError: Performance case optimize timeout

mattbloke99 avatar Jun 05 '24 11:06 mattbloke99

@mattbloke99 PR: #332 increased the optimized time limit, you can update the vdbbench to v0.0.10.

alwayslove2013 avatar Jun 07 '24 09:06 alwayslove2013

I am also getting this timeout issue with v0.0.11 trying to run Search Performance Test (1M Dataset, 768 Dims), and 10M and 100M as well. Trying to run a test using DiskANN with search_list = 100.

Bare Metal Hardware: Server: HPE ProLiant DL360 Gen11 BIOS: 2.20 CPU: Dual Socket Intel(R) Xeon(R) Gold 6426Y DRAM: 128 GB Milvus Version: 2.4.4 using Docker Image Latest VectorDB Bench Version: 0.0.11 via pip install Python Version: 3.11

2024-07-02 14:31:06,433 | INFO: generated uuid for the tasks: 82c1b66f8ec24aad90dffc606128e746 (interface.py:66) (349602)
2024-07-02 14:31:06,565 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | Milvus         | Performance  Cohere-MEDIUM-1M        None | 2024070214 (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO: task submitted: id=82c1b66f8ec24aad90dffc606128e746, 2024070214, case number: 1 (interface.py:231) (349602)
2024-07-02 14:31:07,086 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, drop_old=True (interface.py:164) (349724)
2024-07-02 14:31:07,086 | INFO: Starting run (task_runner.py:100) (349724)
2024-07-02 14:31:07,336 | INFO: Milvus client drop_old collection: VectorDBBenchCollection (milvus.py:45) (349724)
2024-07-02 14:31:07,341 | INFO: Milvus create collection: VectorDBBenchCollection (milvus.py:55) (349724)
2024-07-02 14:31:08,347 | INFO: Read the entire file into memory: test.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,379 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,412 | INFO: Start performance case (task_runner.py:158) (349724)
2024-07-02 14:31:09,199 | INFO: (SpawnProcess-1:1) Start inserting embeddings in batch 5000 (serial_runner.py:35) (349889)
2024-07-02 14:31:09,199 | INFO: Get iterator for shuffle_train.parquet (dataset.py:225) (349889)
2024-07-02 14:31:22,801 | INFO: (SpawnProcess-1:1) Loaded 100000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:31:45,193 | INFO: (SpawnProcess-1:1) Loaded 200000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:09,818 | INFO: (SpawnProcess-1:1) Loaded 300000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:33,988 | INFO: (SpawnProcess-1:1) Loaded 400000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:58,992 | INFO: (SpawnProcess-1:1) Loaded 500000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:21,667 | INFO: (SpawnProcess-1:1) Loaded 600000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:45,791 | INFO: (SpawnProcess-1:1) Loaded 700000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:07,282 | INFO: (SpawnProcess-1:1) Loaded 800000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:31,609 | INFO: (SpawnProcess-1:1) Loaded 900000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,207 | INFO: (SpawnProcess-1:1) Loaded 1000000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,211 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=226.0110656730103 (serial_runner.py:61) (349889)
2024-07-02 14:34:58,411 | INFO: Milvus post insert before optimize (milvus.py:101) (352315)
2024-07-02 15:04:56,111 | WARNING: VectorDB optimize timeout in 1800 (task_runner.py:249) (349724)
2024-07-02 15:04:56,185 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:191) (349724)
Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 247, in _optimize
    return future.result(timeout=self.ca.optimize_timeout)[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 458, in result
    raise TimeoutError()
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 165, in _run_perf_case
    build_dur = self._optimize()
                ^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 252, in _optimize
    raise PerformanceTimeoutError("Performance case optimize timeout") from e
vectordb_bench.models.PerformanceTimeoutError: Performance case optimize timeout
2024-07-02 15:04:56,190 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'} failed to run, reason=Performance case optimize timeout (interface.py:179) (349724)
2024-07-02 15:04:56,192 | INFO |Task summary: run_id=82c1b, task_label=2024070214 (models.py:329)
2024-07-02 15:04:56,192 | INFO |DB     | db_label case              label      | load_dur    qps        latency(p99)    recall        max_load_count | label (models.py:329)
2024-07-02 15:04:56,192 | INFO |------ | -------- ----------------- ---------- | ----------- ---------- --------------- ------------- -------------- | ----- (models.py:329)
2024-07-02 15:04:56,192 | INFO |Milvus |          Performance768D1M 2024070214 | 0.0         0.0        0.0             0.0           0              | ?     (models.py:329)
2024-07-02 15:04:56,193 | INFO: write results to disk /home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/results/Milvus/result_20240702_2024070214_milvus.json (models.py:187) (349724)
2024-07-02 15:04:56,197 | INFO: Success to finish task: label=2024070214, run_id=82c1b66f8ec24aad90dffc606128e746 (interface.py:203) (349724)

anrahman4 avatar Jul 02 '24 22:07 anrahman4

I am also getting this timeout issue with v0.0.11 trying to run Search Performance Test (1M Dataset, 768 Dims), and 10M and 100M as well. Trying to run a test using DiskANN with search_list = 100.

Bare Metal Hardware: Server: HPE ProLiant DL360 Gen11 BIOS: 2.20 CPU: Dual Socket Intel(R) Xeon(R) Gold 6426Y DRAM: 128 GB Milvus Version: 2.4.4 using Docker Image Latest VectorDB Bench Version: 0.0.11 via pip install Python Version: 3.11

2024-07-02 14:31:06,433 | INFO: generated uuid for the tasks: 82c1b66f8ec24aad90dffc606128e746 (interface.py:66) (349602)
2024-07-02 14:31:06,565 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO | Milvus         | Performance  Cohere-MEDIUM-1M        None | 2024070214 (task_runner.py:338)
2024-07-02 14:31:06,565 | INFO: task submitted: id=82c1b66f8ec24aad90dffc606128e746, 2024070214, case number: 1 (interface.py:231) (349602)
2024-07-02 14:31:07,086 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, drop_old=True (interface.py:164) (349724)
2024-07-02 14:31:07,086 | INFO: Starting run (task_runner.py:100) (349724)
2024-07-02 14:31:07,336 | INFO: Milvus client drop_old collection: VectorDBBenchCollection (milvus.py:45) (349724)
2024-07-02 14:31:07,341 | INFO: Milvus create collection: VectorDBBenchCollection (milvus.py:55) (349724)
2024-07-02 14:31:08,347 | INFO: Read the entire file into memory: test.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,379 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:207) (349724)
2024-07-02 14:31:08,412 | INFO: Start performance case (task_runner.py:158) (349724)
2024-07-02 14:31:09,199 | INFO: (SpawnProcess-1:1) Start inserting embeddings in batch 5000 (serial_runner.py:35) (349889)
2024-07-02 14:31:09,199 | INFO: Get iterator for shuffle_train.parquet (dataset.py:225) (349889)
2024-07-02 14:31:22,801 | INFO: (SpawnProcess-1:1) Loaded 100000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:31:45,193 | INFO: (SpawnProcess-1:1) Loaded 200000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:09,818 | INFO: (SpawnProcess-1:1) Loaded 300000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:33,988 | INFO: (SpawnProcess-1:1) Loaded 400000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:32:58,992 | INFO: (SpawnProcess-1:1) Loaded 500000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:21,667 | INFO: (SpawnProcess-1:1) Loaded 600000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:33:45,791 | INFO: (SpawnProcess-1:1) Loaded 700000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:07,282 | INFO: (SpawnProcess-1:1) Loaded 800000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:31,609 | INFO: (SpawnProcess-1:1) Loaded 900000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,207 | INFO: (SpawnProcess-1:1) Loaded 1000000 embeddings into VectorDB (serial_runner.py:59) (349889)
2024-07-02 14:34:55,211 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=226.0110656730103 (serial_runner.py:61) (349889)
2024-07-02 14:34:58,411 | INFO: Milvus post insert before optimize (milvus.py:101) (352315)
2024-07-02 15:04:56,111 | WARNING: VectorDB optimize timeout in 1800 (task_runner.py:249) (349724)
2024-07-02 15:04:56,185 | WARNING: Failed to run performance case, reason = Performance case optimize timeout (task_runner.py:191) (349724)
Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 247, in _optimize
    return future.result(timeout=self.ca.optimize_timeout)[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 458, in result
    raise TimeoutError()
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 165, in _run_perf_case
    build_dur = self._optimize()
                ^^^^^^^^^^^^^^^^
  File "/home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/backend/task_runner.py", line 252, in _optimize
    raise PerformanceTimeoutError("Performance case optimize timeout") from e
vectordb_bench.models.PerformanceTimeoutError: Performance case optimize timeout
2024-07-02 15:04:56,190 | WARNING: [1/1] case {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'} failed to run, reason=Performance case optimize timeout (interface.py:179) (349724)
2024-07-02 15:04:56,192 | INFO |Task summary: run_id=82c1b, task_label=2024070214 (models.py:329)
2024-07-02 15:04:56,192 | INFO |DB     | db_label case              label      | load_dur    qps        latency(p99)    recall        max_load_count | label (models.py:329)
2024-07-02 15:04:56,192 | INFO |------ | -------- ----------------- ---------- | ----------- ---------- --------------- ------------- -------------- | ----- (models.py:329)
2024-07-02 15:04:56,192 | INFO |Milvus |          Performance768D1M 2024070214 | 0.0         0.0        0.0             0.0           0              | ?     (models.py:329)
2024-07-02 15:04:56,193 | INFO: write results to disk /home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/results/Milvus/result_20240702_2024070214_milvus.json (models.py:187) (349724)
2024-07-02 15:04:56,197 | INFO: Success to finish task: label=2024070214, run_id=82c1b66f8ec24aad90dffc606128e746 (interface.py:203) (349724)

I also have noticed that I do not exhibit this behavior when switching back to v0.0.7 of vectordb-bench on the exact same hardware / OS:

2024-07-03 09:32:25,935 | INFO: generated uuid for the tasks: 16b30ded9e904e4d97e77d59a56c89a3 (interface.py:69) (725651)
2024-07-03 09:32:26,116 | INFO | DB             | CaseType     Dataset               Filter | task_label (task_runner.py:288)
2024-07-03 09:32:26,116 | INFO | -----------    | ------------ -------------------- ------- | -------    (task_runner.py:288)
2024-07-03 09:32:26,116 | INFO | Milvus         | Performance  Cohere-MEDIUM-1M        None | 2024070309 (task_runner.py:288)
2024-07-03 09:32:26,116 | INFO: task submitted: id=16b30ded9e904e4d97e77d59a56c89a3, 2024070309, case number: 1 (interface.py:235) (725651)
2024-07-03 09:32:26,640 | INFO: [1/1] start case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, drop_old=True (interface.py:167) (725751)
2024-07-03 09:32:26,896 | INFO: Milvus client drop_old collection: VectorDBBenchCollection (milvus.py:45) (725751)
2024-07-03 09:32:26,901 | INFO: Milvus create collection: VectorDBBenchCollection (milvus.py:55) (725751)
2024-07-03 09:32:28,136 | INFO: Read the entire file into memory: test.parquet (dataset.py:207) (725751)
2024-07-03 09:32:28,186 | INFO: Read the entire file into memory: neighbors.parquet (dataset.py:207) (725751)
2024-07-03 09:32:29,015 | INFO: (SpawnProcess-1:1) Start inserting embeddings in batch 5000 (serial_runner.py:35) (725918)
2024-07-03 09:32:29,016 | INFO: Get iterator for shuffle_train.parquet (dataset.py:225) (725918)
2024-07-03 09:32:43,467 | INFO: (SpawnProcess-1:1) Loaded 100000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:33:08,294 | INFO: (SpawnProcess-1:1) Loaded 200000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:33:30,597 | INFO: (SpawnProcess-1:1) Loaded 300000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:33:58,060 | INFO: (SpawnProcess-1:1) Loaded 400000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:34:23,968 | INFO: (SpawnProcess-1:1) Loaded 500000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:34:49,068 | INFO: (SpawnProcess-1:1) Loaded 600000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:35:12,599 | INFO: (SpawnProcess-1:1) Loaded 700000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:35:36,397 | INFO: (SpawnProcess-1:1) Loaded 800000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:35:58,726 | INFO: (SpawnProcess-1:1) Loaded 900000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 09:36:21,494 | INFO: Milvus post insert before optimize (milvus.py:100) (725918)

2024-07-03 10:12:29,141 | INFO: (SpawnProcess-1:1) Loaded 1000000 embeddings into VectorDB (serial_runner.py:61) (725918)
2024-07-03 10:12:29,142 | INFO: (SpawnProcess-1:1) Finish loading all dataset into VectorDB, dur=2400.126074722997 (serial_runner.py:63) (725918)
2024-07-03 10:12:30,164 | INFO: Milvus optimizing before search (milvus.py:92) (732543)
2024-07-03 10:12:34,820 | INFO: Finish loading the entire dataset into VectorDB, insert_duration=2401.1505225989968, optimize_duration=4.540166485996451 load_duration(insert + optimize) = 2405.6907 (task_runner.py:136) (725751)
2024-07-03 10:12:35,638 | INFO: SpawnProcess-1:3 start search the entire test_data to get recall and latency (serial_runner.py:173) (732670)
2024-07-03 10:12:46,387 | INFO: SpawnProcess-1:3 search entire test_data: cost=4.3488s, queries=1000, avg_recall=0.977, avg_latency=0.0043, p99=0.0054 (serial_runner.py:208) (732670)
2024-07-03 10:12:47,106 | INFO: Start search 300s in concurrency 60, filters: None (mp_runner.py:93) (725751)
2024-07-03 10:12:57,267 | INFO: Syncing all process and start concurrency search, concurrency=60 (mp_runner.py:102) (725751)
2024-07-03 10:17:57,305 | INFO: SpawnProcess-1:36 search 300s: actual_dur=300.0066s, count=28500, qps in this process: 94.9979 (mp_runner.py:73) (732938)
2024-07-03 10:17:57,308 | INFO: SpawnProcess-1:57 search 300s: actual_dur=300.0074s, count=28074, qps in this process: 93.5777 (mp_runner.py:73) (732959)
2024-07-03 10:17:57,309 | INFO: SpawnProcess-1:10 search 300s: actual_dur=300.0155s, count=28016, qps in this process: 93.3818 (mp_runner.py:73) (732912)
2024-07-03 10:17:57,310 | INFO: SpawnProcess-1:38 search 300s: actual_dur=300.008s, count=28444, qps in this process: 94.8108 (mp_runner.py:73) (732940)
2024-07-03 10:17:57,314 | INFO: SpawnProcess-1:62 search 300s: actual_dur=300.0124s, count=28597, qps in this process: 95.3194 (mp_runner.py:73) (732964)
2024-07-03 10:17:57,314 | INFO: SpawnProcess-1:15 search 300s: actual_dur=300.0097s, count=28575, qps in this process: 95.2469 (mp_runner.py:73) (732917)
2024-07-03 10:17:57,316 | INFO: SpawnProcess-1:56 search 300s: actual_dur=300.0132s, count=27957, qps in this process: 93.1859 (mp_runner.py:73) (732958)
2024-07-03 10:17:57,317 | INFO: SpawnProcess-1:45 search 300s: actual_dur=300.0049s, count=28597, qps in this process: 95.3218 (mp_runner.py:73) (732947)
2024-07-03 10:17:57,319 | INFO: SpawnProcess-1:11 search 300s: actual_dur=300.01s, count=28065, qps in this process: 93.5469 (mp_runner.py:73) (732913)
2024-07-03 10:17:57,321 | INFO: SpawnProcess-1:9 search 300s: actual_dur=300.0064s, count=28598, qps in this process: 95.3246 (mp_runner.py:73) (732911)
2024-07-03 10:17:57,324 | INFO: SpawnProcess-1:34 search 300s: actual_dur=300.0053s, count=28407, qps in this process: 94.6883 (mp_runner.py:73) (732936)
2024-07-03 10:17:57,325 | INFO: SpawnProcess-1:31 search 300s: actual_dur=300.0053s, count=28573, qps in this process: 95.2417 (mp_runner.py:73) (732933)
2024-07-03 10:17:57,324 | INFO: SpawnProcess-1:63 search 300s: actual_dur=300.007s, count=28012, qps in this process: 93.3712 (mp_runner.py:73) (732965)
2024-07-03 10:17:57,326 | INFO: SpawnProcess-1:20 search 300s: actual_dur=300.0051s, count=28461, qps in this process: 94.8684 (mp_runner.py:73) (732922)
2024-07-03 10:17:57,327 | INFO: SpawnProcess-1:52 search 300s: actual_dur=300.0047s, count=28378, qps in this process: 94.5919 (mp_runner.py:73) (732954)
2024-07-03 10:17:57,328 | INFO: SpawnProcess-1:5 search 300s: actual_dur=300.0069s, count=28629, qps in this process: 95.4278 (mp_runner.py:73) (732905)
2024-07-03 10:17:57,328 | INFO: SpawnProcess-1:19 search 300s: actual_dur=300.01s, count=27979, qps in this process: 93.2602 (mp_runner.py:73) (732921)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:64 search 300s: actual_dur=300.0107s, count=28614, qps in this process: 95.3766 (mp_runner.py:73) (732966)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:44 search 300s: actual_dur=300.0088s, count=28569, qps in this process: 95.2272 (mp_runner.py:73) (732946)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:42 search 300s: actual_dur=300.0106s, count=28414, qps in this process: 94.71 (mp_runner.py:73) (732944)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:49 search 300s: actual_dur=300.007s, count=28649, qps in this process: 95.4944 (mp_runner.py:73) (732951)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:6 search 300s: actual_dur=300.0131s, count=28053, qps in this process: 93.5059 (mp_runner.py:73) (732908)
2024-07-03 10:17:57,329 | INFO: SpawnProcess-1:16 search 300s: actual_dur=300.0117s, count=28277, qps in this process: 94.253 (mp_runner.py:73) (732918)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:33 search 300s: actual_dur=300.0108s, count=28267, qps in this process: 94.2199 (mp_runner.py:73) (732935)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:48 search 300s: actual_dur=300.0069s, count=28427, qps in this process: 94.7545 (mp_runner.py:73) (732950)
2024-07-03 10:17:57,330 | INFO: SpawnProcess-1:46 search 300s: actual_dur=300.0079s, count=27985, qps in this process: 93.2809 (mp_runner.py:73) (732948)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:30 search 300s: actual_dur=300.009s, count=28427, qps in this process: 94.7538 (mp_runner.py:73) (732932)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:29 search 300s: actual_dur=300.005s, count=28379, qps in this process: 94.5951 (mp_runner.py:73) (732931)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:26 search 300s: actual_dur=300.013s, count=27931, qps in this process: 93.0993 (mp_runner.py:73) (732928)
2024-07-03 10:17:57,331 | INFO: SpawnProcess-1:22 search 300s: actual_dur=300.0065s, count=28648, qps in this process: 95.4913 (mp_runner.py:73) (732924)
2024-07-03 10:17:57,332 | INFO: SpawnProcess-1:25 search 300s: actual_dur=300.0046s, count=28342, qps in this process: 94.4719 (mp_runner.py:73) (732927)
2024-07-03 10:17:57,332 | INFO: SpawnProcess-1:18 search 300s: actual_dur=300.0112s, count=28221, qps in this process: 94.0665 (mp_runner.py:73) (732920)
2024-07-03 10:17:57,332 | INFO: SpawnProcess-1:53 search 300s: actual_dur=300.0027s, count=28641, qps in this process: 95.4691 (mp_runner.py:73) (732955)
2024-07-03 10:17:57,333 | INFO: SpawnProcess-1:43 search 300s: actual_dur=300.0052s, count=28211, qps in this process: 94.035 (mp_runner.py:73) (732945)
2024-07-03 10:17:57,333 | INFO: SpawnProcess-1:7 search 300s: actual_dur=300.0079s, count=28281, qps in this process: 94.2675 (mp_runner.py:73) (732909)
2024-07-03 10:17:57,333 | INFO: SpawnProcess-1:47 search 300s: actual_dur=300.0053s, count=28149, qps in this process: 93.8283 (mp_runner.py:73) (732949)
2024-07-03 10:17:57,335 | INFO: SpawnProcess-1:23 search 300s: actual_dur=300.0107s, count=28526, qps in this process: 95.0833 (mp_runner.py:73) (732925)
2024-07-03 10:17:57,335 | INFO: SpawnProcess-1:39 search 300s: actual_dur=300.0064s, count=28483, qps in this process: 94.9413 (mp_runner.py:73) (732941)
2024-07-03 10:17:57,336 | INFO: SpawnProcess-1:60 search 300s: actual_dur=300.0048s, count=28462, qps in this process: 94.8718 (mp_runner.py:73) (732962)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:28 search 300s: actual_dur=300.0071s, count=28591, qps in this process: 95.3011 (mp_runner.py:73) (732930)
2024-07-03 10:17:57,336 | INFO: SpawnProcess-1:14 search 300s: actual_dur=300.0039s, count=27943, qps in this process: 93.1421 (mp_runner.py:73) (732916)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:51 search 300s: actual_dur=300.0053s, count=28490, qps in this process: 94.965 (mp_runner.py:73) (732953)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:61 search 300s: actual_dur=300.0059s, count=28229, qps in this process: 94.0948 (mp_runner.py:73) (732963)
2024-07-03 10:17:57,338 | INFO: SpawnProcess-1:8 search 300s: actual_dur=300.0024s, count=28498, qps in this process: 94.9926 (mp_runner.py:73) (732910)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:50 search 300s: actual_dur=300.0103s, count=28284, qps in this process: 94.2768 (mp_runner.py:73) (732952)
2024-07-03 10:17:57,337 | INFO: SpawnProcess-1:13 search 300s: actual_dur=300.0058s, count=28375, qps in this process: 94.5815 (mp_runner.py:73) (732915)
2024-07-03 10:17:57,338 | INFO: SpawnProcess-1:27 search 300s: actual_dur=300.0016s, count=28323, qps in this process: 94.4095 (mp_runner.py:73) (732929)
2024-07-03 10:17:57,339 | INFO: SpawnProcess-1:59 search 300s: actual_dur=300.0082s, count=28012, qps in this process: 93.3708 (mp_runner.py:73) (732961)
2024-07-03 10:17:57,339 | INFO: SpawnProcess-1:40 search 300s: actual_dur=300.006s, count=28612, qps in this process: 95.3714 (mp_runner.py:73) (732942)
2024-07-03 10:17:57,340 | INFO: SpawnProcess-1:21 search 300s: actual_dur=300.0056s, count=28086, qps in this process: 93.6183 (mp_runner.py:73) (732923)
2024-07-03 10:17:57,340 | INFO: SpawnProcess-1:35 search 300s: actual_dur=300.0054s, count=28016, qps in this process: 93.385 (mp_runner.py:73) (732937)
2024-07-03 10:17:57,341 | INFO: SpawnProcess-1:17 search 300s: actual_dur=300.0014s, count=28462, qps in this process: 94.8729 (mp_runner.py:73) (732919)
2024-07-03 10:17:57,341 | INFO: SpawnProcess-1:55 search 300s: actual_dur=300.0039s, count=28632, qps in this process: 95.4388 (mp_runner.py:73) (732957)
2024-07-03 10:17:57,341 | INFO: SpawnProcess-1:24 search 300s: actual_dur=300.003s, count=27957, qps in this process: 93.1891 (mp_runner.py:73) (732926)
2024-07-03 10:17:57,342 | INFO: SpawnProcess-1:12 search 300s: actual_dur=300.0051s, count=28514, qps in this process: 95.0451 (mp_runner.py:73) (732914)
2024-07-03 10:17:57,342 | INFO: SpawnProcess-1:37 search 300s: actual_dur=300.0012s, count=27994, qps in this process: 93.313 (mp_runner.py:73) (732939)
2024-07-03 10:17:57,342 | INFO: SpawnProcess-1:41 search 300s: actual_dur=300.0026s, count=28504, qps in this process: 95.0125 (mp_runner.py:73) (732943)
2024-07-03 10:17:57,343 | INFO: SpawnProcess-1:58 search 300s: actual_dur=300.0057s, count=28552, qps in this process: 95.1715 (mp_runner.py:73) (732960)
2024-07-03 10:17:57,344 | INFO: SpawnProcess-1:32 search 300s: actual_dur=300.0031s, count=28428, qps in this process: 94.759 (mp_runner.py:73) (732934)
2024-07-03 10:17:57,346 | INFO: SpawnProcess-1:54 search 300s: actual_dur=300.0052s, count=28526, qps in this process: 95.085 (mp_runner.py:73) (732956)
2024-07-03 10:17:57,348 | INFO: End search in concurrency 60: dur=300.07863932699547s, total_count=1700846, qps=5668.0009 (mp_runner.py:109) (725751)
2024-07-03 10:17:58,041 | INFO: Update largest qps with concurrency 60: current max_qps=5668.0009 (mp_runner.py:113) (725751)
2024-07-03 10:17:58,042 | INFO: Performance case got result: Metric(max_load_count=0, load_duration=2405.6907, qps=5668.0009, serial_latency_p99=0.0054, recall=0.977) (task_runner.py:150) (725751)
2024-07-03 10:17:58,042 | INFO: [1/1] finish case: {'label': <CaseLabel.Performance: 2>, 'dataset': {'data': {'name': 'Cohere', 'size': 1000000, 'dim': 768, 'metric_type': <MetricType.COSINE: 'COSINE'>}}, 'db': 'Milvus'}, result=Metric(max_load_count=0, load_duration=2405.6907, qps=5668.0009, serial_latency_p99=0.0054, recall=0.977), label=ResultLabel.NORMAL (interface.py:169) (725751)
2024-07-03 10:17:58,042 | INFO |Task summary: run_id=16b30, task_label=2024070309 (models.py:286)
2024-07-03 10:17:58,042 | INFO |DB     | db_label case              label      | load_dur     qps          latency(p99)    recall        max_load_count | label (models.py:286)
2024-07-03 10:17:58,042 | INFO |------ | -------- ----------------- ---------- | ------------ ------------ --------------- ------------- -------------- | ----- (models.py:286)
2024-07-03 10:17:58,042 | INFO |Milvus |          Performance768D1M 2024070309 | 2405.6907    5668.0009    0.0054          0.977         0              | :)    (models.py:286)
2024-07-03 10:17:58,042 | INFO: write results to disk /home/labuser/.local/lib/python3.11/site-packages/vectordb_bench/results/Milvus/result_20240703_2024070309_milvus.json (models.py:144) (725751)
2024-07-03 10:17:58,042 | INFO: Succes to finish task: label=2024070309, run_id=16b30ded9e904e4d97e77d59a56c89a3 (interface.py:207) (725751)

anrahman4 avatar Jul 03 '24 17:07 anrahman4

@mattbloke99 @anrahman4 Your machine needs 2400.126074722997 seconds to load + index the dataset in 0.0.7.

We changed the process in the latest vdbbench, the index part is moved into optimize. Since it only costs you dur=226.0110656730103s to load data into milvus, you'll probablly need 3000s timeout config for optimize.

Just change this OPTIMIZE_TIMEOUT_768D_1M = 30 * 60 # 30min into 3000 in vectordb_bench/__init__.py and you'll be fine.

We'll check if the older timeout config still works for the latest milvus cases and ajust it accordingly. AFAIK, in our 16c64G tests it costs around 1500s.

XuanYang-cn avatar Jul 04 '24 08:07 XuanYang-cn

Can confirm this worked with v0.0.11, thank you very much!

anrahman4 avatar Jul 16 '24 20:07 anrahman4

image

Can not confirm this worked with v0.0.12, why create this question,the timeout was set 8 hours

1191510903 avatar Aug 08 '24 01:08 1191510903

@1191510903 8hours is a really long time. Could you please upload logs of Milvus so that I can know why this is taking so much time?

Also I'd like to ask some quick questions:

  1. What's your Milvus version?
  2. Distributed or standalone?
  3. What's the cpu and memory resources for Milvus? Especially IndexNode?

XuanYang-cn avatar Aug 08 '24 02:08 XuanYang-cn