usearch icon indicating copy to clipboard operation
usearch copied to clipboard

Bug: Non of methods are working in Multi-threaded envirionment

Open etempm opened this issue 1 year ago • 11 comments

Describe the bug

I tried each of Add, Remove, Save and Search methods independently from each others with multiple threads. All of them are failed. Memory corruption error raised.

Steps to reproduce

You can run any of this methods in multiple threads to see error.

Expected behavior

At least, thread safe Search method is required. I am using locking to overcome this, but it is very slow.

USearch version

2.12.0

Operating System

Windows 11

Hardware architecture

x86

Which interface are you using?

Other bindings

Contact Details

[email protected]

Are you open to being tagged as a contributor?

  • [ ] I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • [X] I have searched the existing issues

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

etempm avatar Jun 04 '24 12:06 etempm

You are probably doing something wrong, but I can't guess without code 🤗

ashvardanian avatar Jun 04 '24 14:06 ashvardanian

Please run this code to see error. Without lock, it throw Access violation exception.

` private async void Test() { USearchIndex UIndex = new USearchIndex( metricKind: MetricKind.Cos, // Choose cosine metric quantization: ScalarKind.Float32, // Only quantization to Float32, Float64 is currently supported dimensions: 512, // Define the number of dimensions in input vectors connectivity: 16, // How frequent should the connections in the graph be, optional expansionAdd: 128, // Control the recall of indexing, optional expansionSearch: 64 // Control the quality of search, optional ); //UIndex.IncreaseCapacity(102400); //-> THIS METHOD IS PRIVATE, SO I CAN NOT ACCESS IT.

var TaskAdd = Task.Run(() =>
{
    Parallel.For(0L, 102400L, i =>
    {
        float[] tmpVector = new float[512];
        for (int j = 0; j < 512; j++)
        {
            tmpVector[j] = Random.Shared.NextSingle();
        }
        //Monitor.Enter(UIndex);  //--> if I enter lock here, no problems.
        UIndex.Add((ulong)i, tmpVector); //--> Throws access vialotion error with out lock.
        //Monitor.Exit(UIndex);
    });
});


var TaskSearch = Task.Run(() =>
{
    float[] SearchVector = Enumerable.Range(0, 512).Select(n => 0.001f * n).ToArray();
    long Result = 0;
    ulong[] Keys;
    float[] Distances;
    float[] tmpVector = new float[512];
    ulong Start = UIndex.Capacity();
    Parallel.For(0L, 102400L, i =>
    {
        //Monitor.Enter(UIndex);   //--> if I enter lock here, no problems.
        Result = UIndex.Search(SearchVector, 5, out Keys, out Distances); //--> Throws access vialotion error with out lock.
        //Monitor.Exit(UIndex);                                                           //cacheLock.ExitWriteLock();
    });
});
await Task.WhenAny(TaskAdd, TaskSearch);

} `

etempm avatar Jun 04 '24 15:06 etempm

@etempm, can you limit the number of threads to a value smaller or equal to the number of physical cores on your machine? I suspect that's the issue.

ashvardanian avatar Jun 04 '24 16:06 ashvardanian

Yes, you are right!. I limited thread count to Environment.ProcessorCount and it worked like a charm. Thank you very much!!!

etempm avatar Jun 05 '24 07:06 etempm

Just for clarification. I will use usearch for real time face recognition over 100 ip cameras. Each camera working on own thread. With the 16 core cpu, 100 threads, real time. I can not limit thread count in thar situation. Will this be same problem? What is you opinion?

etempm avatar Jun 05 '24 19:06 etempm

For that we need to extend the "reserve" interface in C#, to allow optional second argument for the number of thread-specific "contexts" it can use. Then everything would work fine. Can you try patching the library to do that?

ashvardanian avatar Jun 05 '24 19:06 ashvardanian

If i can understand what you mean, i wil be glad to do that. I see extend method in native class which is increase capacity in c#. But i did not understand "thread specific context". C# code just calling your dll. Is your reserve method has that "context" parameter already? Can you tell more details .

etempm avatar Jun 05 '24 20:06 etempm

Ok, i am looking c code now. As i can see, a lot of functionality is missing on c# side. I can add that functions to c# wrapper. If you can add thread specific context parameter also, i will patch it too. Thanks for great support again.

etempm avatar Jun 05 '24 21:06 etempm

So the C++ layer has a "reserve" function, which returns a structure - not just an integer. In that structure you can inform the number of threads that will use the index. For every one of those I pre-allocate queues and other buffers to avoid doing that during concurrent calls. That functionality is what we want to expose to C# users 🤗

Thanks to you too!

ashvardanian avatar Jun 05 '24 21:06 ashvardanian

I think you mean

struct index_limits_t { std::size_t members = 0; std::size_t threads_add = std::thread::hardware_concurrency(); std::size_t threads_search = std::thread::hardware_concurrency();

inline index_limits_t(std::size_t n, std::size_t t) noexcept : members(n), threads_add(t), threads_search(t) {}
inline index_limits_t(std::size_t n = 0) noexcept : index_limits_t(n, std::thread::hardware_concurrency()) {}
inline std::size_t threads() const noexcept { return (std::max)(threads_add, threads_search); }
inline std::size_t concurrency() const noexcept { return (std::min)(threads_add, threads_search); }

};

etempm avatar Jun 06 '24 11:06 etempm

Hi, sory, I am not cpp guy. Trying to call function. I defined struct as:

[StructLayout(LayoutKind.Sequential)] public struct IndexLimitsT { public ulong Members; public ulong ThreadsAdd; public ulong ThreadsSearch; }

I added this function to NativeMethods

[DllImport(LibraryName, CallingConvention = CallingConvention.Cdecl)] public static extern void usearch_reserve(usearch_index_t index, IndexLimitsT limits, out usearch_error_t error);

and called the method like this:

IndexLimitsT limits2 = new IndexLimitsT { Members = 10, ThreadsAdd = 10, ThreadsSearch = 10, }; NativeMethods.usearch_reserve(this._index, limits2, out IntPtr error); HandleError(error);

bu no luck. Any idea?

etempm avatar Jun 06 '24 15:06 etempm

@etempm hi! Is this still bothering you? Can you please format the code in your comments?

ashvardanian avatar Aug 15 '24 02:08 ashvardanian

Hi, did this get resolved please? I am running into the same issue where I try and add values to the index in parallel (using the same number of cores as my CPU has) but I run into access violation issues without a lock. Thanks

jamesryanbell avatar Nov 12 '24 12:11 jamesryanbell

@jamesryanbell, which programming language SDK are you using?

ashvardanian avatar Nov 12 '24 14:11 ashvardanian

@ashvardanian I'm using the c# SDK

jamesryanbell avatar Nov 12 '24 14:11 jamesryanbell

I am mostly focused on a full rewrite of the engine these days and can't honestly speak about the state of the C# binding, as I've never used it and don't write C#. But if you have a good minimal example, @jamesryanbell, I may be able to reproduce that. Thanks!

ashvardanian avatar Nov 12 '24 14:11 ashvardanian

Thank you, @ashvardanian

using var index = new USearchIndex(
    metricKind: MetricKind.Cos, // Choose cosine metric
    quantization: ScalarKind.Float32, // Only quantization to Float32, Float64 is currently supported
    dimensions: 384,  // Define the number of dimensions in input vectors
    connectivity: 16, // How frequent should the connections in the graph be, optional
    expansionAdd: 128, // Control the recall of indexing, optional
    expansionSearch: 10 // Control the quality of search, optional
);
Parallel.ForEach(dataToProcess, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount - 1 }, kvp =>
{
    index.Add(kvp.Key, kvp.Value.Embedding);
});

return index;

This causes one of the two errors below to be triggered, and is also the case if I use a ConcurrentDictionary for dataToProcess.

System.AccessViolationException: 'Attempted to read or write protected memory. This is often an indication that other memory is corrupt.'

Cloud.Unum.USearch.USearchException: 'USearch operation failed: Reserve capacity ahead of insertions!'

jamesryanbell avatar Nov 19 '24 11:11 jamesryanbell

Yes, @jamesryanbell, you must reserve a number of threads proportional to MaxDegreeOfParallelism first, but seems like the C# interface doesn't have that second argument of reserve exposed. Can you try calling IncreaseCapacity before the parallel block?

ashvardanian avatar Nov 21 '24 15:11 ashvardanian

@ashvardanian do you mean this method? If so, it's Private so I can't call it directly. Within Index there is an internal call to CheckIncreaseCapacity which increases the capacity either by 1 or by the length of the array passed.

/// <summary>
/// Reserves additional capacity in the index.
/// </summary>
/// <param name="size">The number of new vectors to reserve capacity for.</param>
private void IncreaseCapacity(ulong size)
{
    usearch_reserve(this._index, (UIntPtr)(this.Size() + size), out IntPtr error);
    HandleError(error);
}

jamesryanbell avatar Nov 21 '24 15:11 jamesryanbell

Oh, that's not perfect. You'd want to avoid any multi-threaded calls to usearch_reserve. Any chance you could refactor it to match the other bindings, @jamesryanbell? I rarely work on the C# side.

ashvardanian avatar Nov 21 '24 15:11 ashvardanian

Sorry @ashvardanian, I'm not sure what you mean by Any chance you could refactor it to match the other bindings. I'm adding 1.5 million embeddings to my index and it's taking half and hour, I assume due to the fact it's single threaded. I was hoping to add them in batches, in parallel, to speed things up.

jamesryanbell avatar Nov 21 '24 15:11 jamesryanbell

Hi @ashvardanian would you be able to look into this please?

jamesryanbell avatar Jan 20 '25 13:01 jamesryanbell

Hi @jamesryanbell! It's hard for me to prioritize the ongoing patches over the next major release and the other libraries that I maintain. Maybe someone else wants to look into this?

ashvardanian avatar Jan 20 '25 13:01 ashvardanian