kwiver icon indicating copy to clipboard operation
kwiver copied to clipboard

vital/types/geo_point is not thread safe

Open judajake opened this issue 8 years ago • 6 comments

When two threads are accessing the m_loc and at least one is via geo_raw_point_t location( int crs ) const, this can cause a race condition and segfault.

judajake avatar Dec 01 '17 21:12 judajake

I think this would benefit from a design discussion. I'm not sure I want to penalize every user of that method because someone is using the object in a non-thread-safe manner. (That method is const from an API perspective, but actually it is a mutating method. To some extent, my knee-jerk reaction is "don't do that".)

geo_polygon has the same issue...

mwoehlke-kitware avatar Dec 04 '17 17:12 mwoehlke-kitware

I think if we are having a const function, the user should not have to know that it is mutable underneath. Furthermore, we do not have direct control over how a pipeline is created. Thus, either we should just get rid of the cache map, add a mutex protection to the cache map, or we enforce for all process asking locations in different units to copy the geopoint first. I can get around this in my pipelines, but the type of pipelines written for other projects would end up running into this when they switch to the new geopoint. I mainly created this issue to document the issue.

judajake avatar Dec 04 '17 20:12 judajake

To further Juda's point, I think if the API specifies a const function users should be able to pretend it's non-mutating and be absolved from looking into a cxx file. Otherwise we're breaking the contract. It seems to me that it's the onus of the function writer to ensure that the const function behaves as const. In this case it seems that the function should be made thread-safe. It would be good to have an offline discussion with stakeholders though just to solicit more opinion.

dstoup avatar Dec 05 '17 13:12 dstoup

Well said @dstoup. I agree. It should be made thread safe. This probably means either putting a mutex around the map access or just ditching the map and recomputing the values instead of caching. I assume the mutex is the lesser performance hit, but I could be wrong.

mleotta avatar Dec 05 '17 14:12 mleotta

It would be good to have an offline discussion with stakeholders though just to solicit more opinion.

Yes, that was exactly my main point.

I can think of a couple options:


Drop the cache

Pros:

  • No threading issues. Single-thread, single-conversion case is not penalized.
  • Class gets smaller.

Cons:

  • Repeat requests of the same conversion are penalized.

Use a mutex

We could use either a fast (even non-recursive should be safe) mutex, or an R/W mutex. The fast mutex would maximize performance in the case of zero contention, while the R/W mutex would maximize performance in the case of many requests for the same conversion.

Pros:

  • Thread safe.
  • Method retains "effective" const-ness.

Cons:

  • Class gets bigger.
  • Single thread users are penalized.

This is supposed to be a low-level, lightweight class. Most of the drawbacks of this approach are that it is necessarily counter to this objective.


Make the method mutable

Pros:

  • Introduces no new overhead.

Cons:

  • Users are responsible for synchronization where required, and must have a mutable instance in order to request conversions.
    • An alternative, but less easy-to-use, API is available that would allow circumventing this requirement.

Hybrid approach

Make the caching method mutable, but also include a non-caching const overload.

Pros:

  • Does not penalize users with a mutable instance.

Cons:

  • Users with a mutable instance must take care to avoid synchronization issues.
  • Users with a non-mutable instance can't benefit from caching.

Which of these is the correct trade-off? I'm not sure...

mwoehlke-kitware avatar Dec 05 '17 15:12 mwoehlke-kitware

This has been aging for a while. Has any consensus been reached. If the race condition is still there, it seems to be rare in the wild, so the single use case looks to be the most frequent. Is there a typical (or worst case) application where we can measure the cache hit rate? If the hit rate is low, then maybe deleting the cache is the way to go.

linus-sherrill avatar May 18 '20 16:05 linus-sherrill