BloomFilter.NetCore icon indicating copy to clipboard operation
BloomFilter.NetCore copied to clipboard

Unexpectedly high memory usage

Open YarekTyshchenko opened this issue 2 years ago • 5 comments

Perhaps I'm missing some configuration options, but I would have expected Bloom filter to have static memory usage:

[MemoryDiagnoser]
public class Benchmark
{
    private static readonly int items = 30_000_000;
    [Benchmark]
    public void BloomFilter()
    {
        var bf = FilterBuilder.Build(1000, 0.01);
        for (var i = 0; i < items; i++)
        {
            bf.Add($"property_{i}_name");
        }
        for (var i = 0; i < items; i++)
        {
            if (!bf.Contains($"property_{i}_name"))
            {
                Console.WriteLine($"False negative {i}");
            }
        }
    }

    [Benchmark]
    public void Dictionary()
    {
        var bf = new Dictionary<string, bool>();
        for (var i = 0; i < items; i++)
        {
            bf.Add($"property_{i}_name", true);
        }
        for (var i = 0; i < items; i++)
        {
            if (!bf.ContainsKey($"property_{i}_name"))
            {
                Console.WriteLine($"False negative {i}");
            }
        }
    }
}
BenchmarkDotNet v0.13.7, macOS Big Sur 11.6.5 (20G527) [Darwin 20.6.0]
Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 6.0.202
  [Host]     : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2


|      Method |     Mean |    Error |   StdDev |         Gen0 |        Gen1 |       Gen2 | Allocated |
|------------ |---------:|---------:|---------:|-------------:|------------:|-----------:|----------:|
| BloomFilter |  9.538 s | 0.0471 s | 0.0440 s | 3774000.0000 |           - |          - |  11.03 GB |
|  Dictionary | 17.162 s | 0.2386 s | 0.1993 s | 1008000.0000 | 131000.0000 | 11000.0000 |   6.37 GB |

// * Hints *
Outliers
  Benchmark.Dictionary: Default -> 2 outliers were removed (17.85 s, 18.53 s)

Tested with various different options, and Dictionary uses consistently less memory

YarekTyshchenko avatar Aug 17 '23 11:08 YarekTyshchenko

Allocated memory does not mean actual usage,bloom use BitArray as storage, it actually takes up less memory than Dict.

[MemoryDiagnoser]
public class Issues9
{
    public int DataSize = 3_000_000;

    private IList<byte[]> filterData;

    private IList<string> dictData;

    [GlobalSetup]
    public void Setup()
    {
        filterData = new List<byte[]>(DataSize);
        dictData = new List<string>(DataSize);

        for (var i = 0; i < DataSize; i++)
        {
            filterData.Add(Encoding.UTF8.GetBytes($"property_{i}_name"));
            dictData.Add($"property_{i}_name");
        }
    }

    [Benchmark]
    public void BloomFilter()
    {
        var filter = FilterBuilder.Build(1000, 0.01);

        for (var i = 0; i < DataSize; i++)
        {
            filter.Add(filterData[i]);
        }

        for (var i = 0; i < DataSize; i++)
        {
            if (!filter.Contains(filterData[i]))
            {
            }
        }
    }

    [Benchmark]
    public void Dictionary()
    {
        var bf = new Dictionary<string, bool>();

        for (var i = 0; i < DataSize; i++)
        {
            bf.Add(dictData[i], true);
        }

        for (var i = 0; i < DataSize; i++)
        {
            if (!bf.ContainsKey(dictData[i]))
            {
            }
        }
    }
}

vla avatar Sep 07 '23 12:09 vla

Hmm, I still can't make sense of the results:

|      Method |     Mean |    Error |  StdDev |        Gen0 |      Gen1 |      Gen2 | Allocated |
|------------ |---------:|---------:|--------:|------------:|----------:|----------:|----------:|
| BloomFilter | 471.4 ms |  6.60 ms | 5.85 ms | 153000.0000 |         - |         - | 457.77 MB |
|  Dictionary | 814.9 ms | 10.60 ms | 9.91 ms |   1000.0000 | 1000.0000 | 1000.0000 | 309.41 MB |

YarekTyshchenko avatar Sep 21 '23 13:09 YarekTyshchenko

Since bloom filter is a probabilistic structure, it should occupy a fraction of memory of the dict, unless the bf configuration has storage that's larger than all 3 million byte arrays. /shrug, I'm sure I'm misunderstanding something here, I'm looking to use it as a way to reduce cache misses, so I'm happy with high error rate, I'm also looking for a nicely bounded memory usage scaling to basically infinity of elements

Is there any other way of measuring total size of the dict and bf? (short of making a test application and seeing its actual memory usage). I appreciate that Allocations are actually measuring the wrong thing.

YarekTyshchenko avatar Sep 21 '23 13:09 YarekTyshchenko

Write a console program, and do not close the program after the test is successful. Then, check how much memory the process currently occupies.

vla avatar Sep 28 '23 16:09 vla

Is there any other way of measuring total size of the dict and bf?

The size has been allocated during initialization

vla avatar Sep 28 '23 16:09 vla