Feature Request: CFG with Larger Basic Blocks Ignoring Implicit Exceptions
Feature Request: CFG with Larger Basic Blocks Ignoring Implicit Exceptions
Description
OPAL's cfg module constructs CFGs where basic blocks are split at instructions that may throw implicit exceptions (e.g., NullPointerException for invokevirtual, ArithmeticException for idiv). This results in smaller basic blocks due to additional edges to exception handlers. I propose an optional CFG construction mode that abstracts away implicit exceptions, creating larger basic blocks based solely on explicit control flow (e.g., jumps, returns, explicit throw statements). This mode would only consider exception edges for explicit try-catch blocks defined in the code.
Motivation
Smaller basic blocks increase CFG complexity, which complicates custom static analyses implementations. An optional mode that ignores implicit exceptions would:
- Reduce the number of basic blocks and edges, simplifying the CFG.
- Enable more efficient analyses for use cases where implicit exceptions are irrelevant.
- Provide flexibility for users to choose between sound (exception-aware) and simplified (exception-agnostic) CFG representations.
Example
Consider the following Java code:
public int compute(int x, int y) {
int result = x / y; // Potential ArithmeticException
return result;
}
Dear @Lapeno01, thank you for your suggestion. We are aware of the issues that the frequency of (implicit) exception in Java raise for CFGs (and the derived (post-)dominator trees), so I am open to considering this. However, I'd like to see more justification on the use cases this would have. When can (implicit) exceptions safely be ignored, when can't they?
Further questions:
- Should this only ignore implicit exceptions? What sets them apart from explicit exceptions in this use case? Note that this is an important question, since ignoring just implicit exceptions would hardly improve most CFGs, given that method calls can always end in an explicit exception thrown by the callee - and method calls are very frequent.
- How can this be implemented sensibly? In particular, does it require separate hierarchy of exception-agnostic CFGs? Should it use the existing classes, just be computed differently (with the risk of using the wrong CFG since the types don't distinguish them)? Or could it even be possible to make the exception-agnostic CFG just a view of the (already computed) exception-aware CFG?
- Relatedly, how would we provide this for users to flexibly choose between? Is this a global decision (i.e., either all CFGs are exception-aware or all are exception-agnostic)? This would ensure consistency but different parts of an analysis / set of analyses might have different requirements regarding exception-awareness? Or would this be a local decision by each client of a CFG? That could risk inconsistent views and extra computation/storage for the additional CFGs if different clients require both exception-aware and exception-agnostic CFGs (less of an issue if it could be implemented as a view, though it could still impact derived data-structures like dominator trees). How would clients choose the appropriate CFG? Currently, they always have a pre-computed exception-aware CFG available if they use TAC.
Dear @errt,
Below, I address your questions:
1. When Can (Implicit) Exceptions Safely Be Ignored, and When Can't They?
My Approach
To simplify the CFG for custom static analyses, I propose ignoring both implicit and explicit exceptions that are not
explicitly handled within try-catch blocks in the current method. This prioritizes CFG simplicity over
completeness in some scenarios.
When Exceptions Can Be Safely Ignored
If the analysis assumes inputs are always valid (e.g., no null pointers, out-of-bounds indices),
then many implicit exceptions(such as NullPointerException or ArrayIndexOutOfBoundsException)
can be safely ignored without affecting the soundness of the analysis.
When Exceptions Should Not Be Ignored
- User-defined or explicitly declared exceptions: These are often intentional parts of the program logic and should not be ignored.
- Runtime exceptions thrown by user code: These may be harder to predict and often require special handling to preserve correctness.
- User-defined exceptions tbh require over complex handling.
2. Implementation Approach
I propose implementing the exception-agnostic CFG as a view over the existing CFG, rather than duplicating or modifying the core structure.
Mechanism Overview
- Configuration flag (e.g.,
ignoreImplicitExceptions) to control the exception-handling strategy during CFG construction. - During the block-splitting phase, for each instruction with possible exceptions, distinguish between:
- Implicit exceptions: JVM-defined exceptions that are not explicitly declared or caught.
-
Explicit exceptions: User-defined or declared exceptions that are handled via
try-catch.
- If
ignoreImplicitExceptionsis enabled, exclude edges corresponding to implicit exceptions from the CFG view, unless they are explicitly handled in atry-catchblock.
✅ The main functionality I currently rely on in this view(custom one) is the computation of dominator and post-dominator trees.
3. How to Provide Flexibility for Users?
Building on the implementation approach, flexibility can be achieved by letting each CFG client make its own decision locally. It would also be beneficial to provide a (post)dominator tree implementation that respects the chosen CFG view.
Thank you for your input. This is very valuable since we were pondering exception-agnostic post-dominator trees just recently as well. However, I can't quite fully follow your answers there:
-
It is still unclear to me which kinds of analyses could ignore exceptions
-
I don't see how you could distinguish exceptions as you describe. You can't see whether an exception might be explicitly caught in calling methods, for example.
-
Also, it is still unclear to me how to deal with exceptions originating from method calls. Those could both be JVM exceptions as well as explicitly thrown exceptions from the called method, with no way to distinguish between them. Given that on average about every second statement (IIRC) is a method call, if these exceptions can't be ignored, you can not expect any significant reduction in CFG complexity.
-
I wonder whether what you propose is a general concept that fits a multitude of analyses or just very specialized for a single use case. Ignoring all exceptions, e.g., would seem general to me, but ignoring just explicitly thrown or caught exceptions might not be suitable to many analyses?
-
The approach you describe does not sound like a view. A view over the existing CFG would be something that does not affect the CFG construction process, but use the existing exception-aware CFG data with special accessor methods to provide data to the client as if it was an exception-agnostic CFG. This would allow not computing/storing both variants of a CFG if different clients have different needs.
-
It is also unclear how a flag could be implemented in CFG constructions for clients to decide locally when most clients in OPAL should not invoke any CFG construction themselves but use the CFG provided with the TAC. This is important to avoid computing the same CFG multiple times, but unclear how it would work if clients can request different kinds of CFGs.
Another question:
- Is an exception-agnostic CFG required or would an exception-agnostic post-dominator tree suffice? This might be simpler since post-dominator trees are computed lazily in OPAL, triggered by clients. Would an exception-agnostic dominator tree also be required? Usually, effects of exceptions on dominator trees are small, only post-dominator trees are significantly affected.
Regarding how to distinguish exceptions, I’m also unsure how to differentiate them. Apologies for my earlier unclear expression.
On Exception-Agnostic Analyses
-
For my use case, I believe an exception-agnostic post-dominator tree would suffice, rather than a exception-agnostic CFG.
-
Part of my custom static analysis focuses on determining the relative positions of method calls with identical or similar parameter origins, identified through abstract interpretation and def-use chains. To support this, I’ve implemented custom code that:
- Merges basic blocks while ignoring all exceptions.
- Constructs a dominator tree over this simplified structure.
However, ignoring all exceptions disrupts some basic block relationships, resulting in an oversimplified (and often irrelevant) control flow model.
I see the exception-agnostic post-dominator tree as broadly useful, especially for analyses prioritizing control flow relationships over precise exception handling.
On the CFG View Concept
Apologies again for my unclear phrasing, what I intended was something like:
interpretation.domain.bbCFG.dominatorTree(ignoreImplicitExceptions)
interpretation.domain.bbCFG.postDominatorTree(ignoreImplicitExceptions)