java-callgraph Static call analysis: overwriting and implementing calls

In order to see, what may be called from what, two further types of method calls seem to me important: (1.) a method is overwritten by another one. (this is not just a true call, but however, I think java-callgraph could do so, since the overwriting method is executed instead of the overwritten one.) (2.) an interface method is implemented by a concrete method. (java-callgraph is already listing calls of interface method. So for completeness (respective to the given jars) I want to see also the implementing methods.)

Oct 13 '17 10:10 pdolland

These are different kinds of relationships, though. They are not call relationships, and thus should not be mixed with them.

Oct 20 '18 18:10 matthieu-vergne

Surely, I am not contesting, that "overwrites" and "implements" are different kinds of relationships. But for a complete overview of the static call structure there are needed as well as the direct calls.

Von: Matthieu Vergne [mailto:[email protected]] Gesendet: Samstag, 20. Oktober 2018 20:11 An: gousiosg/java-callgraph Cc: Dolland Peter MTPCE; Author Betreff: Re: [gousiosg/java-callgraph] Static call analysis: overwriting and implementing calls (#13)

These are different kinds of relationships, though. They are not call relationships, and thus should not be mixed with them.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/gousiosg/java-callgraph/issues/13#issuecomment-431605595, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADShEMakrOYLAUxgO4e9djIn-N8_JE-mks5um2dJgaJpZM4P4So7.

Oct 23 '18 06:10 pdolland

I think I see what you mean, and I could understand the need. But it seems to me that such addition would have a significant impact on the tool.

I just proposed a pull request for a better identification of lambdas in #22 (and I am glad it was accepted), but what I did was fix to better identify a one-to-one relationships between the lambda and the method instantiating it.

In your case it is not a one-to-one relationships. If you take the first point, with overriding. Here is an example:

class Root {
  void doSomething() {
    System.out.println("I am root!");
  }
}
class Child1 extends Root {
  @Override
  void doSomething() {
    System.out.println("I am child1!");
  }
}
class Child2 extends Root {
  @Override
  void doSomething() {
    System.out.println("I am child2!");
  }
}

Now you have a method accepting a Root:

void execute(Root root) {
  root.doSomething();
}

The point is that the call graph here would depend on the agument's value:

execute(new Root());
execute(new Child1());
execute(new Child2());

All three are valid calls, but then each of them calls the doSomething() of a different class. You would need to look for all the calls to execute(...) in the JAR, look which type of instance is passed in argument, and consequently add the call in the list. Basically, it would mean that the analysis should say that the method execute() calls all 3.

The second point, with interface implementation, is exactly the same, excepted that Root has no implementation. So you won't have the code details of Root.doSomething() since it does not exist, but the result in terms of execute() calls would be the same.

The problem here is that you cannot do it in a single pass without consuming huge amounts of space (RAM or hard disk, depending on the strategy used). Indeed, execute may be called anywhere in the JAR, and thus you won't know all the calls before you have looked everywhere.

If you do it in 2 passes, then during the first pass you need to store the types of the various instances passed to execute(). You need only one per type, so if you have few override/implement then it is only few, but you need to store them recursively, since the value given to execute() might come from its own caller, which may have received it from its own caller, and so on recursively. You need to store all the types for each level during the first pass (space complexity O(NxM)), and then exploit this information during the second pass to add all the calls relationships. If you it happens to be done to an object that pass more or less everywhere (e.g. a context state), then you may fill your RAM with a huge amount of data.

If you do it in 1 pass, you need to remember instead which methods might be impacted by such behaviours. But since you cannot guess in advance all the override/implement relationships, then you must assume it may happen for any (non-final) type. Which means you need to remember more or less every method you have seen, or said another way store a complete graph of the calls in the JAR. Which means that you need to store at least the same amount of information that you generate in output, which is really costly for huge JARs. More other, you need then to browse this graph when you discover an override/implement relationships, in order to know where to add it. This is one again really costly, but this time in terms of computation time. You would need to use even more space to optimize it.

Briefly, with 1 pass you obtain a tool which consumes more than it generates at the end, and with 2 passes you still need to consume way more than what you currently do. So it might be useful, but the impact would be huge.

Maybe it would make sense in the dynamic analysis (which I am not familiar with and I did not use), but in static analysis it seems to me like a tool killer.

Just too finish on a good note : if it is about adding the override/implement relationships between methods, then it might be OK, because you only need to list them once for all. Then all the calls relationships can be inferred from what is generated in output. So the information would be there. But retrieving the calls themselves... I would not recommend it for this tool. Just make a script/additional tool which builds on the result of the JAR analysis for that.

Oct 23 '18 22:10 matthieu-vergne

java-callgraph was supposed to be a quick-and-dirty call graph generator. Resolving virtual dispatch calls (which is what we are discussing here) is not a trivial problem and other, more advanced tools (such as Soot and Wala) feature multiple implementations of algorithms proposed in scientific literature. I am not sure that we should implement those in java-callgraph.

Oct 24 '18 07:10 gousiosg

Let me show what we see currently with the example of Mathieu in the call graph (only methods):

M:Root:doSomething() (M)java.io.PrintStream:println(java.lang.String)

M:Child1:doSomething()(M)java.io.PrintStream:println(java.lang.String)

M:Child2:doSomething()(M)java.io.PrintStream:println(java.lang.String)

Any definition of a method execute calling root.doSomething() will lead to an entry in the call graph as

M:AnyClass:execute(Root) (M)Root:doSomething()

independently, what the concrete (dynamic) type of root is. But however, any hint is missing, that possibly Child1.doSomething or Child2.doSomething is called instead. We could solve this issue by adding calls

M:AnyClass:execute(Root) (M)Child1:doSomething()

M:AnyClass:execute(Root) (M)Child2:doSomething()

to the graph. I think, it would be much clearer for the static analysis (and supposedly easier to implement), if we add the indication, that Root:doSomething() is overwritten:

M:Root:doSomething() (V)Child1:doSomething()

M:Root:doSomething() (V)Child2:doSomething()

We avoid here to multiply the calls of Root:doSomething(), and no (static) information is lost.

For interfaces the same is valid. However, I think it would be clearer, if the call type indicates that we are dealing with an interface method and its implementation, e.g. with (F).

Please note, that we are discussing here only the static analysis of the call structure. Only the existence of an overwriting or implementing method implies a potential call, if a super class or interface method is called. It is essential for the static analysis of a program, that we can recognize a potential call sequence from a call of method A to a call of method B, and possibly even more important, if we can prove, that there is no such call sequence from A to B. But this is only possible, if we have a complete analysis of the call structure and our tool is correct.

Von: Georgios Gousios [mailto:[email protected]] Gesendet: Mittwoch, 24. Oktober 2018 09:39 An: gousiosg/java-callgraph Cc: Dolland Peter MTPCE; Author Betreff: Re: [gousiosg/java-callgraph] Static call analysis: overwriting and implementing calls (#13)

java-callgraph was supposed to be a quick-and-dirty call graph generator. Resolving virtual dispatch calls (which is what we are discussing here) is not a trivial problem and other, more advanced tools (such as Soot and Wala) feature multiple implementations of algorithms proposed in scientific literature. I am not sure that we should implement those in java-callgraph.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/gousiosg/java-callgraph/issues/13#issuecomment-432547162, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADShEGaSiKDMYfsbrifSywEYVoZMA9MRks5uoBkhgaJpZM4P4So7.

Oct 24 '18 11:10 pdolland

Thanks for the comment. I don't disagree with the algorithm you propose, what I am saying is that tools such as Wala and Soot already offer this functionality (actually, more advanced versions of it). It would take me a lot of time (which I don't have currently) to implement this and get it right, so I will probably pass. However, I am open to PRs!

Oct 24 '18 11:10 gousiosg

If it is only about adding the overriding/implementing relationships then it should be doable in one pass with few impacts. This is a similar analysis than I did for the lambdas, so you may look and inspire from the class DynamicCallManager that I wrote for them (and the Cucumber feature tests to ensure it works). Actually, you may add your methods in the same class, since its purpose is to deal with such dynamic analysis.

I would gladly review your pull request if you make one.

Oct 24 '18 18:10 matthieu-vergne

Hi Georgios and Matthieu,

I made locally some changes to the java-callgraph static lib. The most important one:

Additionally to the classes (C:) and the methods (M:) section I am dumping out an overridings (V:) section with lines of the format

           V:<specifying class or interface>:<method signature>  <overriding class>

So I can get the full static call information. If a method is calling an interface method, I will also see, what the potential implementations are and which methods they are calling.

Furthermore I list in the classes section, if a class is extending another one, which is marked with <S>, and if a class is implementing an interface, or an interface is extending another one, which is marked with <I> in both cases. Interfaces are listed with I: instead of C:. Abstract and native methods are listed with A: and N: in the classes section too, since they will not appear as a calling method necessarily.

My question is: Are you interested in my changes? I would push them in a separate branch, but I would need permissions to do so. So you will be able, to merge them according to your needs.

Best regards,

Peter Dolland, MTPCE

Von: Matthieu Vergne [mailto:[email protected]] Gesendet: Mittwoch, 24. Oktober 2018 20:30 An: gousiosg/java-callgraph Cc: Dolland Peter MTPCE; Author Betreff: Re: [gousiosg/java-callgraph] Static call analysis: overwriting and implementing calls (#13)

If it is only about adding the overriding/implementing relationships then it should be doable in one pass with few impacts. This is a similar analysis than I did for the lambdas, so you may look and inspire from the class DynamicCallManager that I wrote for them to make a pull request. Actually, you may add your methods in the same class, since its purpose is to deal with such dynamic analysis.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/gousiosg/java-callgraph/issues/13#issuecomment-432775463, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADShECiGUZvFfuY1ABdN1P5OKh5el7x8ks5uoLGhgaJpZM4P4So7.

Jan 04 '19 11:01 pdolland

Hi Peter, yes I am interested to see a PR that implements the functionality you are discussing here. Please also make sure you update the README file with the new output formats.

Jan 07 '19 08:01 gousiosg