SPTAG How could SPANN load DEEP-1B (358GB) into memory with only 128GB RAM (in SPANN paper)?

Hi, I'm reading the source code and paper of SPANN. In /AnnService/src/IndexBuilder/main.cpp, we can see data will be loaded in function DefaultVectorReader::GetVectorSet(). In that function, vectors are loaded in one single ReadBinary() method.

But if the size of origin file exceeds the RAM, how could do that? In SPANN paper, only uses 128GB RAM for DEEP-1B, which has 358GB basepoints file according to big-ann-benchmark.

Mar 07 '22 08:03 matchyc

Hi, have you solved this problem? I also encountered the same.

Aug 12 '22 08:08 LLLjun

Hi, have you solved this problem? I also encountered the same.

No, I didn't. And I suppose the team uses more memory footprint than they described in paper? That does not mean they were wrong, the description of memory usage in the paper is all about searching procedure, so maybe in the building step, spann needs more memory space. I guess...

Aug 12 '22 08:08 matchyc

Got it, thanks.

Aug 12 '22 08:08 LLLjun

By the way, how much memory does it take to build the SPANN index using the DEEP1B dataset?

Aug 12 '22 09:08 LLLjun

By the way, how much memory does it take to build the SPANN index using the DEEP1B dataset?

No idea, if you figure out I'd like to hear it! Thank you in advance.

Aug 12 '22 14:08 matchyc