opal Roundtrip Bytecode Modification fails when using `ba.LabeledCode`

During the current iteration of our project course on static program analysis, one of our students discovered what i believe is a bug in abstract interpretation. The students were working on a task where they had to change bytecode of certain methods, and then write the modified bytecode to files. I know there is multiple ways to do this, but one way that seems particularly suited is the org.opalj.ba.LabeledCode builder. However, students discovered that the simple round-trip of reading code and writing it again without modifications using LabeledCode does not work for all methods of real-world projects. For example, it fails on 4% of all methods from Apaches PDFBox 3.0.3 (link to jar). Here is an example that exercises a specific method for which the roundtrip fails:

import org.opalj.ba.LabeledCode
import org.opalj.br.ObjectType
import org.opalj.br.analyses.Project

object PdfBoxDemo extends App {

  val pathToPdfBox = "pdfbox-app-3.0.3.jar"
  val project = Project(new java.io.File(pathToPdfBox))

  project.classFile(ObjectType("org/bouncycastle/x509/X509CertificatePair")) match {
    case Some(cf) =>
      val theMethod = cf.methods.find(_.name == "equals").get
      val builder = LabeledCode(theMethod.body.get)
      val newCode = builder.result(cf.version, theMethod)._1
      val newMethod = theMethod.copy(body = Some(newCode))
      println(s"Successfully re-serialized bytecode: ${newMethod.body.get}")
    case _ =>
  }
}

The exception that is thrown is very long and very detailed (pdfbox-error.txt), but boils down to the following:

OPAL fails to compute a new StackMapTable for the code (see here)
The reason for that is that OPAL believes there is dead code in the method that is to be written (see here)
I think this is not actually true. Debugging revealed that this is the code OPAL believes to be dead in the above example (cf. PC 48 in the exception log). I don't think that is dead code, as the field that is being checked here (named forward) can very well be null.
The CodeAttributeBuilder uses the org.opalj.ai.domain.l0.TypeCheckingDomain to detect dead code via Abstract Interpretation. For some reason, this domain believes that fields of a class are always initialized and can never be null (at least that's what i got from some detailed debugging through the above-mentioned example). Specifically, this method is called to generate a domain value for a field access, which results in a call to this method returning an InitializedObjectValue.

@errt do you have any insights into this? Is the LabeledCode object intended to be used this way, did we do something not supported or is this actually a bug? If so, i'd look into it in the coming days.

Credit to Finn for reporting this issue 👍

Dec 02 '24 12:12 johannesduesing

Sadly I don't have much insight into the abstract interpretation code. Can you give me some hints where that ReferenceValue method is called from? If that method really leads to this, that seems strange. In that case, this would also produce an InitializedObjectValue, right? And that would contradict the method's name.

Difficult to say where the underlying problem is. Could be a mixin issue with the two domains, but could also be wrong implementation in either. Especially org.opalj.ai.domain.l0.TypeCheckingDomain looks strange: The ArrayValue method returns a (potentially-null) DefaultArrayValue, but the related ObjectValue methods returns a (non-null) InitializedObjectValue. The other ObjectValue method returns a (potentially-null) DefaultMObjectValue only if the upperTypeBound has multiple types, but a (non-null) InitializedObjectValue (via the ObjectValue method), which seems strange since I don't think a singleton upperTypeBound would exclude null either.

Dec 02 '24 13:12 errt

Sure i can try to give you some context on this:

The main interpreter loop AI.scala handles field accesse by calling theDomain.getfield to obtain a domain value
This invokes ai.domain.l0.TypeLevelFieldAccessIntructions.getfield([...]) which in turn looks up a domain value for the field by calling TypedValue(pc, fieldType)
This invokes TypedValuesFactory.TypedValue and selects the case for reference values
This now finally leads to the invocation of the ReferenceValue method we were talking about.

I again confirmed via debugging that this in fact calls TypeCheckingDomain.ObjectValue and leads to an initialized object value, thus no field can ever be nullable (in this mix-in configuration for TypeCheckingDomain). So to me this seems like an implementation issue in TypeCheckingDomain, right? Unfortunately i don't have any idea how fields are usually handled in AI, but this seems factually wrong. This is an interesting quote from the code-comments in TypedValueFactory.TypedValue:

/**
     * Factory method to create domain values with a specific type. I.e., values for
     * which we have some type information but no precise value or source information.
     * However, the value is guaranteed to be `null` or properly initialized.
     *
     * For example, if `valueType` is a reference type it may be possible
     * that the actual value is `null`, but such knowledge is not available.
     *
     * The framework uses this method when a method is to be analyzed, but no parameter
     * values are given and initial values need to be generated. This method is not
     * used elsewhere by the framework.
     */

This comment seems to specifically acknowledge that value can be null here, but the actual implementation in TypecheckingDomain contradicts this.

Dec 02 '24 13:12 johannesduesing

Note that the code we're discussing about is from this commit and unchanged since then. Interestingly, here, code in TypeLevelReferenceValues that used to produce an InitializedObjectValue was changed to ObjectValue which then, via TypeCheckingDomain ends up being an InitializedObjectValue again. But, at the time, that did not have isNull = No (at least not explicitly). That only got introduced in this commit, which, interestingly, is about fields. Might still be wrong or point to wrong mixing in of some field related domain to handle this first/differently for fields?

Dec 02 '24 14:12 errt

@johannesduesing Could you try if #254 fixes the issue?

Feb 03 '25 21:02 errt