2024-09-23

  • Update based on kennā€™s post.

kenn While cleaning up, we have also greatly revised the storage method of Vector Search DB @qdrant_engine. Disk usage and memory usage are now 1/10 of what they used to be!

kenn First, we graduated from ada-002 to 3-large.

The number of dimensions has doubled from 1536 to 3072, but the new model can freely reduce the number of dimensions using the Matryoshka method, and even if the number is reduced to 256, the performance exceeds that of ADA.

There is a sense of increased efficiency in the use of manifold space, even physically, and this reduces the size to 1/6.

kenn Second, Qdrantā€™s collections have a large overhead for each one (about 10MB?). So, if the number of stored vectors is small, it will be inefficient. Vectors with the same dimensionality and payload schema can all be grouped into a single collection, and tenants can be separated by payload index to improve spatial efficiency at once. Qdrant 1.11 - The Vector Stronghold: Optimizing Data Structures for Scale and Efficiency - Qdrant image

kenn Furthermore, as the number of dimensions decreases from 1536 to 256, the size of the communicated JSON (12KB per 1000 dimensional vector assuming 12 characters per float) also decreases dramatically The most powerful high compression. High compression is the most powerful.

The final configuration is shared here. We think it is the best at this time in terms of both cosmetics and performance!

kenn Hereā€™s a config template for @qdrant_engine for the multi-tenant setup:

  • Create a single collection to store everything with the same vector dimensions, and filter by tenant ID at query time
  • mmap() as many things as possible on disk to optimize memory usage
  • defrag storage image

The vector size is 1/6, so you can make 5 derivatives from one sentence and stack them and the cost will not change (really?). - It would be interesting to create a variety of derivatives using an Extract questions at various levels of abstraction approach.

  • Right now, itā€™s implemented in a way that only allows it to be placed in a single chunked format, so Iā€™d like to start with a form that makes it easier to do trial and error.

pVectorSearch


This page is auto-translated from /nishio/2024-09惙ć‚Æćƒˆćƒ«ę¤œē“¢ć®ę”¹å–„ using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iā€™m very happy to spread my thought to non-Japanese readers.