I have been working with Milvus v2.2.9 in standalone mode using Docker Compose, running without any CPU limits. My current setup involves creating an index with 1 million embeddings using the IVF_SQ8 indexing method. I've wrapped the Milvus search function (collection.search(...), _async=True
) inside an asynchronous function.
When I send hundreds of asynchronous calls to this function, I observe that the CPU usage peaks at around 250%, accompanied by certain latency. Increasing the number of asynchronous calls doesn't enhance CPU usage, which remains at approximately 250%, but it does lead to increased latency. My machine is equipped with 16 virtual CPUs.
To address this, I attempted to adjust the maxReadConcurrentRatio
to 100.0 in the milvus.yaml
configuration file. Unfortunately, this change did not result in any improvement in CPU usage.
Is there a method to increase CPU utilization without negatively affecting latency? Am I overlooking any configuration options that could optimize performance?