Unexpectedly high memory usage
Perhaps I'm missing some configuration options, but I would have expected Bloom filter to have static memory usage:
[MemoryDiagnoser]
public class Benchmark
{
private static readonly int items = 30_000_000;
[Benchmark]
public void BloomFilter()
{
var bf = FilterBuilder.Build(1000, 0.01);
for (var i = 0; i < items; i++)
{
bf.Add($"property_{i}_name");
}
for (var i = 0; i < items; i++)
{
if (!bf.Contains($"property_{i}_name"))
{
Console.WriteLine($"False negative {i}");
}
}
}
[Benchmark]
public void Dictionary()
{
var bf = new Dictionary<string, bool>();
for (var i = 0; i < items; i++)
{
bf.Add($"property_{i}_name", true);
}
for (var i = 0; i < items; i++)
{
if (!bf.ContainsKey($"property_{i}_name"))
{
Console.WriteLine($"False negative {i}");
}
}
}
}
BenchmarkDotNet v0.13.7, macOS Big Sur 11.6.5 (20G527) [Darwin 20.6.0]
Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 6.0.202
[Host] : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2
DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT AVX2
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|------------ |---------:|---------:|---------:|-------------:|------------:|-----------:|----------:|
| BloomFilter | 9.538 s | 0.0471 s | 0.0440 s | 3774000.0000 | - | - | 11.03 GB |
| Dictionary | 17.162 s | 0.2386 s | 0.1993 s | 1008000.0000 | 131000.0000 | 11000.0000 | 6.37 GB |
// * Hints *
Outliers
Benchmark.Dictionary: Default -> 2 outliers were removed (17.85 s, 18.53 s)
Tested with various different options, and Dictionary uses consistently less memory
Allocated memory does not mean actual usage,bloom use BitArray as storage, it actually takes up less memory than Dict.
[MemoryDiagnoser]
public class Issues9
{
public int DataSize = 3_000_000;
private IList<byte[]> filterData;
private IList<string> dictData;
[GlobalSetup]
public void Setup()
{
filterData = new List<byte[]>(DataSize);
dictData = new List<string>(DataSize);
for (var i = 0; i < DataSize; i++)
{
filterData.Add(Encoding.UTF8.GetBytes($"property_{i}_name"));
dictData.Add($"property_{i}_name");
}
}
[Benchmark]
public void BloomFilter()
{
var filter = FilterBuilder.Build(1000, 0.01);
for (var i = 0; i < DataSize; i++)
{
filter.Add(filterData[i]);
}
for (var i = 0; i < DataSize; i++)
{
if (!filter.Contains(filterData[i]))
{
}
}
}
[Benchmark]
public void Dictionary()
{
var bf = new Dictionary<string, bool>();
for (var i = 0; i < DataSize; i++)
{
bf.Add(dictData[i], true);
}
for (var i = 0; i < DataSize; i++)
{
if (!bf.ContainsKey(dictData[i]))
{
}
}
}
}
Hmm, I still can't make sense of the results:
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|------------ |---------:|---------:|--------:|------------:|----------:|----------:|----------:|
| BloomFilter | 471.4 ms | 6.60 ms | 5.85 ms | 153000.0000 | - | - | 457.77 MB |
| Dictionary | 814.9 ms | 10.60 ms | 9.91 ms | 1000.0000 | 1000.0000 | 1000.0000 | 309.41 MB |
Since bloom filter is a probabilistic structure, it should occupy a fraction of memory of the dict, unless the bf configuration has storage that's larger than all 3 million byte arrays. /shrug, I'm sure I'm misunderstanding something here, I'm looking to use it as a way to reduce cache misses, so I'm happy with high error rate, I'm also looking for a nicely bounded memory usage scaling to basically infinity of elements
Is there any other way of measuring total size of the dict and bf? (short of making a test application and seeing its actual memory usage). I appreciate that Allocations are actually measuring the wrong thing.
Write a console program, and do not close the program after the test is successful. Then, check how much memory the process currently occupies.
Is there any other way of measuring total size of the dict and bf?
The size has been allocated during initialization