succinct icon indicating copy to clipboard operation
succinct copied to clipboard

Problem with large BP vector

Open ShaneDouglas opened this issue 11 years ago • 2 comments

Firstly let me congratulate you on your excellent work. I especially like the ability to memory map the structures. I had a small problem though when trying to create a large BP vector of size > 2^32. I am not sure if it is something I have done incorrectly but in this section of the code bp_vector::build_min_tree() :-

    for (size_t superblock = 0; superblock < n_superblocks; ++superblock) {
        excess_t cur_super_min = static_cast<excess_t>(size());
        excess_t superblock_excess = get_block_excess(superblock * superblock_size);

        for (size_t block = superblock * superblock_size;
             block < std::min((superblock + 1) * superblock_size, n_blocks);
             ++block) {
            cur_super_min = std::min(cur_super_min, superblock_excess + block_excess_min[block]);
        }
        assert(cur_super_min >= 0 && cur_super_min < excess_t(size()));

        superblock_excess_min[m_internal_nodes + superblock] = cur_super_min;
    }

you appear to set cur_super_min to the size of the vector which in my case > 2^32 but excess_t is an int32_t which obviously causes a problem. Not sure if it is a config issue or what but changed it to int64_t and it seemed to work. Since I am not really aquainted with your code I am worried about the side effects of such a action and it would be nice to get your opinion on what is going on. Perhaps I am not allowed to create such a large BP vector ?

Shane.

ShaneDouglas avatar Feb 18 '14 09:02 ShaneDouglas

Hi, I haven't touched that code for a while, but there should be no limitation on the size of the vectors, just on the maximum excess inside the vector, which is numeric_limits<excess_t>::max(). I should put a large comment there, but I thought that the assumption of having at most 2^31 excess would be reasonable (that means a tree of height 2^32).

I haven't tested it, but changing the definition of excess_t to int64_t should remove the problem (the typedef is put there for this purpose). Of course the data structure size becomes slightly larger and you lose binary compatibility, but the succinct file format is not designed to be portable, for performance and memory-mappability reasons.

BTW, thanks for your kind words!

ot avatar Feb 23 '14 17:02 ot

thank you

iamjw0911 avatar Mar 04 '15 11:03 iamjw0911