rascal Invocations of inherited methods

When loading the following java code from an Eclipse project the model.methodInvocation contains an invocation from D->p() to C->m(), as expected.

public class C
{
    public void m() {}
}

public class D extends C
{
    public void p() {
        this.m();
    }
}

rel[loc from,loc to]: {
  <|java+method:///org/D/p()|,|java+method:///org/C/m()|>
}

When compiled to a jar file the target of the invocation is suddenly resolved to D->m()

rel[loc from,loc to]: {
  <|java+method:///org/D/p()|,|java+method:///org/D/m()|>
}

This kind of behaviour is also described in a stackoverflow post, https://stackoverflow.com/questions/15172269/java-bytecode-operation-invokevirtual-does-not-keep-consistency-for-the-method so it may not be a bug, however, in both model.declarations and model.modifiers the method is still referred as C->m()

model.declarations:

rel[loc name,loc src]: {
  <|java+method:///org/C/m()|,|file:///C:/Data/File.jar/org/C.class|>,
  <|java+method:///org/D/p()|,|file:///C:/Data/File.jar/org/D.class|>,
  ...
}

model.modifiers:

rel[loc definition,Modifier modifier]: {
  <|java+method:///org/C/m()|,public()>,
  <|java+method:///org/D/p()|,public()>,
  ...
}

Same goes for model.names and model.containment.

Oct 14 '17 13:10 verhoofstad

This issue is related to static and dynamic binding in Java, which is depicted in the derived bytecode. I want to make some remarks:

1. The facts: If you use the super keyword instead of the this keyword shown in your snippet:

public class C {
    public void m() {}
}

public class D extends C {
    public void p() {
        super.m();
    }
}

Yo will notice that the model.methodInvocation relation changes and you will get your expected output (i.e., <|java+method:///org/D/p()|,|java+method:///org/C/m()|>).

2. Possible reason: The JVM uses different instructions for method invocations, we will highlight two: invokevirtual and invokespecial. The JVM uses the invokevirtual instruction where there is an invocation to an instance method whose class can only be known at runtime (cf. dynamic binding). This is the type of instruction that you are obtaining with the this.m() code (notice that m() can be actually overriden by the subclass).

Conversely, the invokespecial instruction is used in three different scenarios (Venners, 1997):

invocation of instance initialization () methods

invocation of private methods

invocation of methods using the super keyword

In all cases, the class of the instance is not considered, instead the JVM considers the corresponding reference type during compile time (cf. static binding). That is why we know this information from the derived bytecode.

3. A workaround: Consider the set of methods defined in the model.declarations relation and intersect them with the ones you obtain in the model.methodInvocation relation. You should consider the name of the method and parent class. Then, you can identify which are the possible invokers of the method (maybe is not just the superclass). You must check the model.modifiers relation.

Do you agree?

References

Oct 30 '17 12:10 lmove

Well, not really. For the invocation this.m() of the example the Java compiler does indeed add a JVM instruction invokeVirtual D m() but I doubt that should be translated in the M3 model as an invocation of D->m(). The problem with this is it renders the method invocation relation pretty much useless (IMHO) because the invoked method D->m() can not be related to any other relation (modifiers, containment, methodOverrides, etc.) in the M3 model. It rightfully doesn't exist in those relations. In my opinion any invoked method in M3.methodInvocation should also be present in M3.declarations to ensure a consistent model. The only exception to this are method invocations to external code (e.g. when you load an application but without libraries).

You can read the JVM instruction invokeVirtual D m() as follows: "invoke a method m() and start searching for that method at class D". This is actually similar to the meaning of the source code statement this.m(). However, in case of the latter the target of the method invocation is correctly listed as C->m() and not D->m(). For JAR files you can handle invokeVirtual in a similar way as the keyword this is in source code scenario's.

The workaround seems to apply to virtual method call resolution but that's not the issue here. I understand that that is not a part of M3.

References https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.invokevirtual

Nov 12 '17 16:11 verhoofstad

@jurgenvinju we also need an insightful opinion on this issue. What is the right way to model this aspect?

Nov 16 '17 10:11 lmove

Ok; I think everybody has a point here. Very clear discussion!

I look at it like this: * first we extract pure facts from code (be it source code or byte code) * then we infer implications and "what it means" by interpretation of these facts.

Key points:

To separate fact from interpretation improves reusability of the M3 models, because interpretations often have different quality attributes (accuracy, correctness, efficiency) while fact extraction is simple and universal.
An M3 model can store the results of both fact extraction and interpretation, no need to fundamentally change the shape of M3 to represent more information.

So for the invokeVirtual case I agree that we have a number of different steps:

in the bytecode language, an invokeVirtual occurrence is a simple fact.
there are other related simple facts in the bytecode of related classes, such as the definition of method signatures and inheritance
together these facts could form a complete picture but it requires interpretation ("search") to get there. The way we "search" or infer this invocation relation depends on the interpretation of JVM and Java semantics (it could be different for Scala and Jython!)

Let's have a JVM bytecode pure fact extractor, and then let's have an additional analysis which enriches this extracted JVM M3 model up to (or as close a possible) a Java-level M3 model. This analysis would work under the assumption that the bytecode was orginally generated from Java source code. This analysis is akin to a decompiler, but instead of working on the bytecode back towards source code, it works from the JVM extracted facts back to the Java extracted facts. Together all four arrows should produce some kind of "commuting diagram" up to information loss due to the Java compiler.

To fix the current issue, I suspect we have to work first to complete the declarations part of the JVM extracted M3 model, such that we are able to write the JVM_M3->JAVA_M3 enrichment analysis with the available information. It's also important to keep interpretation as low as possible in the initial JVM m3 model, so distinguishing between different types of calls seems quite in order IMHO.

Nov 22 '17 16:11 jurgenvinju

@lmove Did we follow through on this and forgot to close the issue, or do we have to make a decision still? It's a longgg time ago!

Mar 26 '23 12:03 jurgenvinju

@jurgenvinju, we didn't close the discussion. Shall we have a look at this one together with #1145 in the coming days?

Mar 27 '23 06:03 lmove

yes, but first I need some rest :-) Let me know when you have time, or if. There is no urgency, but it would be cool if we can improve the situation here for the future users of the platform.

Mar 30 '23 09:03 jurgenvinju