codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Java: Queries for thread-safe classes

Open yoff opened this issue 8 months ago • 4 comments

This PR introduces three queries for thread-safe classes, corresponding to three properties that such classes must possess, known as

  • P1: No escaping
  • P2: Safe publication
  • P3: Correct synchronization Each query is introduced in its own commit, P3 first since it comes with some definitions that make P2 and P1 simple to write. And then there is a fourth commit fixing up the query for P3, since its detection in the first version only works when there are no loops in the call graph. I chose to show both versions, since the first is quite simple to understand and the second is basically the same idea but operating on SCCs of the call graph.

Some considerations:

  • should (parts of) the module Monitors be moved into concurrency.qll?
  • should the severity be error?
  • the queries are being included in the code quality suite

yoff avatar May 20 '25 13:05 yoff

QHelp previews:

java/ql/src/Likely Bugs/Concurrency/Escaping.qhelp

Escaping

In a thread-safe class, non-final fields should generally be private (or possibly volatile) to ensure that they cannot be accessed by other threads in an unsafe manner.

Recommendation

If the field does not change, mark it as final. If the field is mutable, mark it as private and provide properly synchronized accessors.

References

java/ql/src/Likely Bugs/Concurrency/SafePublication.qhelp

Safe publication

In a thread-safe class, values must be published safely to avoid inconsistent or unexpected behavior caused by visibility issues between threads. If a value is not safely published, one thread may see a stale or partially constructed value written by another thread, leading to subtle concurrency bugs.

In particular, values of primitive types should not be initialised to anything but their default values (which for Object is null) unless this happens in a static context.

Techniques for safe publication include:

  • Using synchronized blocks or methods to ensure that a value is fully constructed before it is published.
  • Using volatile fields to ensure visibility of changes across threads.
  • Using thread-safe collections or classes that provide built-in synchronization, such as are found in java.util.concurrent.
  • Using the final keyword to ensure that a reference to an object is safely published when the object is constructed.

Recommendation

Choose a safe publication technique that fits your use case. If the value only needs to be written once, say for a singleton, consider using the final keyword. If the value is mutable and needs to be shared across threads, consider using synchronized blocks or methods, or using a thread-safe collection from java.util.concurrent.

Example

In the following example, the values of value and server_id are not safely published. The constructor creates a new object and assigns it to the field value. However, the field is not declared as volatile or final, and there are no synchronization mechanisms in place to ensure that the value is fully constructed before it is published. A different thread may see the default value null. Similarly, the field server_id may be observed to be 0.

public class UnsafePublication {
    private Object value;
    private int server_id;

    public UnsafePublication() {
        value = new Object(); // Not safely published, other threads may see the default value null
        server_id = 1; // Not safely published, other threads may see the default value 0
    }

    public Object getValue() {
        return value;
    }

    public int getServerId() {
        return server_id;
    }
}

To fix this example, we declare the field value as volatile. This will ensure that all changes to the field are visible to all threads. The field server_id is only meant to be written once, so we only need the write inside the constructor to be visible to other threads; declaring it final guarantees this:

public class SafePublication {
    private volatile Object value;
    private final int server_id;

    public SafePublication() {
        value = new Object(); // Safely published as volatile
        server_id = 1; // Safely published as final
    }

    public synchronized Object getValue() {
        return value;
    }

    public int getServerId() {
        return server_id;
    }
}

References

java/ql/src/Likely Bugs/Concurrency/ThreadSafe.qhelp

Not thread-safe

In a thread-safe class, all field accesses that can be caused by calls to public methods must be properly synchronized.

Recommendation

Protect the field access with a lock. Alternatively mark the field as volatile if the write operation is atomic. You can also choose to use a data type that guarantees atomic access. If the field is immutable, mark it as final.

References

github-actions[bot] avatar May 21 '25 10:05 github-actions[bot]

Commented offline.

aschackmull avatar May 28 '25 13:05 aschackmull

Notes from offline review:

  • [x] Everything private
  • [x] Better lock detection, look at unreleased lock query
  • [x] Monitor abstract toString
  • [x] Remove char preds and fields from wrapped IPA types
  • [x] Revert to NodeToBeDominated
  • [x] locallySynchronizedOn not field? No, see example on page 550 of https://docs.oracle.com/javase/specs/jls/se25/jls25.pdf
  • [x] Prefix should be calculated via matches or as output + _
  • [x] getErasure -> getSourceDeclaration (needs to be on the field instead of the type)
  • [x] getTypeDescriptor -> getPackage.getName (needs to be on the field instead of the type)
  • [x] java.util , Collections, sync_name + sync_name.matches(“synchronized%”)
  • [x] Define ExposedField ~- [ ] Hoist lookups out of negations, using exists~
  • [x] Pragma[inline] -> pragma[inline_late] + bindingset
  • [x] AnnotatedAsThreadSafe, inline char pred
  • [ ] Create extension point ~- [ ] If a then b else any is often better than a implies b~
  • [x] Safe publication example is wrong.
  • [x] Reformulate unsynchronized to find escaping paths from an exposed field access out to public methods. Is this not what providesAccess and "the call graph induced by a" do? Sure, but the reformulation is more about avoiding binary predicates and recursion through forall.
  • [x] Split into self_conflicting and inconsistently locked
    • A publicly accessible write that is not locked is self-conflicting and can be reported right away.
    • A publicly accessible read that is not locked is conflicting with any publicly accessible write. This can be reported right away, if the definition of exposed field includes that there is a publicly accessible write.
    • All other cases involve two different locks.

yoff avatar Jun 09 '25 06:06 yoff

Code change looks good. If you haven't already, then it's probably good to triage a few results from a MRVA run to check that the latest version still looks good. Also, some basic performance checking is in order - let's do a dca run, but remember that it can be limited to these 3 queries to make it smaller/faster.

aschackmull avatar Oct 21 '25 11:10 aschackmull

Not sure if the failing test is simply unrelated?

yoff avatar Oct 21 '25 13:10 yoff

Not sure if the failing test is simply unrelated?

I think it's unrelated. I just tried to rerun, so let's see if that helps.

aschackmull avatar Oct 22 '25 07:10 aschackmull

Triage of new results:

java/unreleased-lock, 21 new results These are all due to lockInterruptibly now being correctly recognized as a locking call. This should in itself not introduce problems as the predicate failedLock handles InterruptedException.

  • several FPs in apache/geode: The code is very complex, but ultimately a function wraps a locking call and therefor does not release the lock. The query correctly detects that, but it is probably intended by the developer.
  • an FP in geode due to a ReadWriteLock (link). The reader is taken under certain conditions and the write under others. They are later released under the same conditions. A split control flow graph might have prevented this FP.
  • some FPs in hadoop where locking is wrapped in methods.
  • two FPs in google/guava due to locking being wrapped in methods.

java/safe-publication, 9 results The FP rate for this has been scrutinized already, but with only 9 results I went throuhg them anyway.

  • 3 TPs in apache/flink, repeated in the buildless version
  • 2 TPs in gradle
  • 1 FP in gradle due to the unsafely published value being unreadable (it is an internal factory, I think it could probably be static).

java/not-threadsafe, 75 results The FP rate for this has already been established to be good, but I went through a flock for each repo as a sanity check

  • several TPs in apache/flink SystemResourcesCounter.java (no locking at all in that file), repeated in buildless
  • several TPs in gradle, little locking used

yoff avatar Oct 27 '25 13:10 yoff

I will do an extra DCA run on nightly for timing, since I changed the code, but otherwise I am ready to merge this :-)

yoff avatar Oct 27 '25 13:10 yoff

Timing has actually improved with the code change: palatable/lambda is now slightly slower, but apache/commons now sees no slow-down. So the project with the very many prefixes is the only one where running the new queries is even visible.

yoff avatar Oct 28 '25 09:10 yoff