Java: Call Graph
Hello,
I am trying to use CodeQL to get the control flow of a program. More specifically I want to get the control flow into methods that I have marked as sensitive in a classes named SensitiveMethodCall. The goal is to parse the results to get the path. However, right now I am not generating a path. It is hard to tell if a path isn't being generated because of the query itself (I am used to using dataflow, so this is new to me) or if it has to do with this error
Error was: Expected result pattern(s) are not present for path-problem query: Expected at least two result patterns. These should include at least an 'edges' result set (see https://codeql.github.com/docs/writing-codeql-queries/creating-path-queries/). [INVALID_RESULT_PATTERNS]
All of the examples I see on this link have to deal with dataflow and not control flow, so I am having trouble figuring out what I am doing wrong to get the correct output. Here is my query.
import java
import semmle.code.java.ControlFlowGraph
import SensitiveInfo.SensitiveInfo
/**
* @name Control flow path from normal method call to sensitive method call
* @description This query identifies the full control flow path from a normal method call to a sensitive method call.
* It traces every control flow node between the normal and sensitive calls.
* @kind path-problem
* @id java/custom/control-flow-path
* @problem.severity warning
* @tags control-flow, data-flow, security, sensitive-data
* @precision medium
* @security-severity 5.0
* @cwe CWE-200, CWE-201, CWE-209
* @sub-severity high
*
* Get the control flow path from a normal method call to a sensitive method call.
* Returns the full sequence of control flow nodes.
*/
predicate controlFlowPath(MethodCall start, MethodCall end, ControlFlowNode node) {
exists(ControlFlowNode startNode, ControlFlowNode endNode |
startNode = start.getControlFlowNode() and
endNode = end.getControlFlowNode() and
startNode.getANormalSuccessor*() = node and
node.getANormalSuccessor*() = endNode
)
}
/**
* Get the full control flow path between normal and sensitive method calls.
*/
from MethodCall normalCall, SensitiveMethodCall sensitiveCall, ControlFlowNode node
where controlFlowPath(normalCall, sensitiveCall, node)
select normalCall.getControlFlowNode(), normalCall, sensitiveCall, "Full control flow path from normal method call to sensitive method call."
Any help would be greatly appreciated. Thank you.
Hi again @KylerKatz 👋🏻
Queries of type path-problem are intended to capture the path that data takes through a program in a data flow query. I do not believe that you can write control flow queries of type path-problem. So you should use a different query type here.
To then understand where your query goes wrong, the first step would be to identify where the problem is by starting with the smallest components involved and then working your way up. For example, if you query all instances of SensitiveMethodCall, do you get the results you would expect? Similarly, if you use the quick evaluation feature to evaluate all possible results for controlFlowPath, do you get the results you expect?
Also, query metadata should normally be at the very top of the file, above the import statements.
Hello @mbg,
Thank you for all your help.
I have broken the problem down and done some quick evaluations. Using this example,
public class BAD_APIEndpointDebugging {
private static final boolean API_DEBUG = Boolean.parseBoolean(System.getenv("API_DEBUG"));
public static void main(String[] args) {
String apiEndpoint = "https://api.example.com/data";
String apiKey = "api_key_123456789";
if (API_DEBUG) {
testAPI(apiEndpoint, apiKey);
}
}
private static void testAPI(String url, String key) {
System.out.println("DEBUG: Testing API endpoint: " + url + " with API Key: " + key);
// Assume some testing logic here
}
}
SensitiveMethodCall
I get these results
Which lines up with the data that I have in my YML file
extensions:
- addsTo:
pack: custom-codeql-queries
extensible: sinks
data:
- ["fileName", "sinkName", "type"]
- ["BAD_APIEndpointDebugging.java", "println", "Log Sink"]
- ["BAD_APIEndpointDebugging.java", "parseBoolean", "Log Sink"]
- ["BAD_APIEndpointDebugging.java", "testAPI", "Log Sink"]
- ["BAD_APIEndpointDebugging.java", "getenv", "Log Sink"]
So SensitiveMethodCall seems to be returning the desired results
ControlFlowPath
I did the same for this
Looking at this, it seems to almost be alright Results 1 - 3 correspond to this line
private static final boolean API_DEBUG = Boolean.parseBoolean(System.getenv("API_DEBUG"));
4 & 5 correspond to their own self use, however, there should be one starting at testAPI and ending at println I am not sure why this control flow is being missed?
Also, I am still looking for a way to get the actual path. Something like this.
Is this possible for control flow?
Also, here is my updated query for reference
/**
* @name Control flow path from normal method call to sensitive method call
* @description This query identifies the control flow path from a normal method call to a sensitive method call.
* It traces every control flow node between the normal and sensitive calls.
* @kind problem
* @id java/custom/control-flow-path
* @problem.severity warning
* @tags control-flow, security, sensitive-data
* @precision medium
* @security-severity 5.0
* @cwe CWE-200, CWE-201, CWE-209
* @sub-severity high
*/
import java
import semmle.code.java.ControlFlowGraph
import SensitiveInfo.SensitiveInfo
/**
* Predicate to define a control flow path between a start method call and an end method call.
*/
predicate controlFlowPath(MethodCall start, MethodCall end) {
exists(ControlFlowNode startNode, ControlFlowNode endNode |
startNode = start.getControlFlowNode() and
endNode = end.getControlFlowNode() and
startNode.getANormalSuccessor*() = endNode // Track the normal successors between start and end
)
}
from MethodCall normalCall, SensitiveMethodCall sensitiveCall
where controlFlowPath(normalCall, sensitiveCall)
select normalCall, sensitiveCall.getControlFlowNode().toString(), sensitiveCall.getMethod().getReturnType(),
"Control flow path from normal method call to sensitive method call."
Thank you once again for your help.
After some more research, I believe I was referring to the wrong thing, instead of wanting the control-flow paths. I believe it's actually the call graph that I am looking for. Sorry for the confusion. I found this discussion That talks about doing it using Methods.
However, if I use this on my previous example,
public class BAD_APIEndpointDebugging {
private static final boolean API_DEBUG = Boolean.parseBoolean(System.getenv("API_DEBUG"));
public static void main(String[] args) {
String apiEndpoint = "https://api.example.com/data";
String apiKey = "api_key_123456789";
if (API_DEBUG) {
testAPI(apiEndpoint, apiKey);
}
}
private static void testAPI(String url, String key) {
System.out.println("DEBUG: Testing API endpoint: " + url + " with API Key: " + key);
// Assume some testing logic here
}
}
I only get
This is going in the right direction because I am finally getting the path that I have been trying to get. However, it is missing the paths from
parseBoolean() -> getenv()
And
testAPI() -> println()
I believe the reason for this is because they are MethodCalls.
Does anyone know of a way of making something like this work with MethodCalls instead? The issue is that MethodCall doesn't have a call predicate, which allows for the path to be generated.
/**
* @kind path-problem
*/
import java
class StartMethod extends Method {
StartMethod() { getName() = "validateExpression" }
}
class TargetMethod extends Method {
TargetMethod() { getName() = "findValue" }
}
query predicate edges(Method a, Method b) { a.calls(b) }
from TargetMethod end, StartMethod entryPoint
where edges+(entryPoint, end)
select end, entryPoint, end, "Found a path from start to target."
Thank you.
These are getting filtered out because the first column, here end, gives the associated location for the alert, and CodeQL will only show in VS Code / export as SARIF those results with an associated location in user code (i.e., a .java file located within the extracted repository). By contrast, println and parseBoolean are defined in the standard library, so we don't get an alert.
Change the final line to
select entryPoint, entryPoint, end, "Found a path from start to target."
Then you'll get alerts whenever the start point is in user code, and so you will indeed see paths like main -> println.
Note that you will not see a path getenv -> parseBoolean because neither of those methods calls the other. parseBoolean consumes the output of getenv, but this isn't a call-graph relation, it's a data-flow relation, for which the standard CodeQL dataflow library is probably the right tool.
Hello @smowton,
Thank you for your response,
I updated the query with your suggestion, and it is now working. However, I am noticing a few limitations of this query. Just to make sure that we're on the same page here is my current version.
/**
* @kind path-problem
* @id java/sinks/call-graph
*/
import java
import SensitiveInfo.SensitiveInfo
query predicate edges(Method a, Method b) { a.calls(b) }
from DetectedMethod end, Method entryPoint
where edges+(entryPoint, end)
select entryPoint, entryPoint, end, "Found a path from start to target."
There are two main limitations, the first being the one that you pointed out at the end of your response concerning seeing a path getenv -> parseBoolean because neither of those methods calls the other.
The second is that if we have a standard method such as println the location for this will be a class file where it is implemented not where it is called. I have a workaround for this when parsing, however, I still feel like there is a better way of doing so.
To test this, I went back to trying dataflow, I have a helper class DetectedMethodCall which is defined as so,
class DetectedMethodCall extends MethodCall {
DetectedMethodCall() {
// Check for matches against the sinks
exists(File f |
sinks(f.getBaseName(), this.getMethod().getName(), _)
)
}
}
Running a quick eval on this results in these
1 | parseBoolean(...)
2 | getenv(...)
3 | testAPI(...)
4 | println(...)
5 | parseBoolean(...)
6 | getProperty(...)
7 | processData(...)
8 | println(...)
9 | getDataSnippet(...)
10 | substring(...)
11 | parseBoolean(...)
12 | getProperty(...)
13 | println(...)
This checks out and corresponds to the sinks that I have in my YML file.
So, I decided to integrate this class into my dataflow query as the sink.
predicate isSink(DataFlow::Node sink) {
exists(DetectedMethodCall dmc |
sink.asExpr() = dmc
)
}
However, running a quick eval on this predicate, I only get these paths
| | Path
| | 1 | "API_DEBUG" : String | | BAD_APIEndpointDebugging.java:2:81
| | 2 | getenv(...) | | BAD_APIEndpointDebugging.java:2:67
| | String | BAD_APIEndpointDebugging.java:1:14
| | Path
| | 1 | "API_DEBUG" : String | | BAD_APIEndpointDebugging.java:2:81
| | 2 | getenv(...) : String | | BAD_APIEndpointDebugging.java:2:67
| | 3 | parseBoolean(...) | | BAD_APIEndpointDebugging.java:2:46
This excludes testAPI and println is there something you know of that I could add to include all method class such as these two?
I know if I do something like this,
predicate isSink(DataFlow::Node sink) {
exists(DetectedMethodCall dmc |
sink.asExpr() = dmc.getAnArgument()
)
}
The I can get a result like this which is used by println however, I want the method call name, not the arguments as the ending point.
| | 1 | "api_key_123456789" : String | | BAD_APIEndpointDebugging.java:6:25
| | 2 | apiKey : String | | BAD_APIEndpointDebugging.java:9:34
| | 3 | key : String | | BAD_APIEndpointDebugging.java:13:45
| | 4 | ... + ... | | BAD_APIEndpointDebugging.java:14:28
Lastly, here is the full query for reference, I am mostly considering almost everything as a source and just want to see what flows into my methods.
/**
* @name Backward slicing (optimized with extended flow tracking)
* @description Identifies the backward slice of a sink node with performance improvements and additional tracking for transformations
* @kind path-problem
* @problem.severity warning
* @id java/backward-slice-extended
*/
import java
private import semmle.code.java.dataflow.ExternalFlow
import semmle.code.java.dataflow.TaintTracking
import semmle.code.java.dataflow.FlowSources
import CommonSinks.CommonSinks
import SensitiveInfo.SensitiveInfo
module Flow = TaintTracking::Global<SensitiveInfoInErrorMsgConfig>;
import Flow::PathGraph
/** A configuration for tracking sensitive information flow into error messages. */
module SensitiveInfoInErrorMsgConfig implements DataFlow::ConfigSig {
// Optimized source predicate to reduce performance overhead and track additional types
predicate isSource(DataFlow::Node source) {
// Include all string literals (including numeric-like strings)
source.asExpr() instanceof StringLiteral
or
// Track method calls if they are assigned to variables (right-hand side of assignment)
exists(AssignExpr assign |
assign.getRhs() = source.asExpr() and
source.asExpr() instanceof MethodCall
)
or
// Track static method calls
exists(AssignExpr assign |
assign.getRhs() = source.asExpr() and
source.asExpr() instanceof StaticMethodCall
)
or
// Track field accesses if they are assigned to variables (right-hand side of assignment)
exists(AssignExpr assign |
assign.getRhs() = source.asExpr() and
source.asExpr() instanceof FieldAccess
)
or
// Include constructor arguments as sources to capture flow into objects
exists(NewClassExpr newExpr |
newExpr.getAnArgument() = source.asExpr()
)
or
// Include cast expressions where sensitive data might flow between different types
source.asExpr() instanceof CastExpr
}
predicate isSink(DataFlow::Node sink) {
exists(DetectedMethodCall dmc |
sink.asExpr() = dmc
)
}
predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
// Flow from arguments to method calls
exists(MethodCall call |
pred.asExpr() = call.getArgument(_) and
succ.asExpr() = call
)
or
// Flow through binary expressions (e.g., string concatenations)
exists(BinaryExpr binExpr |
pred.asExpr() = binExpr.getAnOperand() and
succ.asExpr() = binExpr
)
// or
// Existing steps (if any)
// ...
}
}
from Flow::PathNode source, Flow::PathNode sink
where
// Capture flows between optimized sources and sinks
Flow::flowPath(source, sink)
select
sink.getNode().getEnclosingCallable(),
source,
sink,
source.getNode().getType().toString(),
sink.getNode().getType().toString(),
"Dataflow from " + source.getNode().toString() + " to " + sink.getNode().toString() + ""
Thank you for your help.
A few points about this:
- If you're looking to both query control-flow relations (what method calls what) and data-flow (what results can flow into arguments of other methods), I would strongly advise querying the two things in different queries, rather than try to cram them into a single query seeking to conflate the two kinds of relationship.
- A sink like
exists(DetectedMethodCall dmc | sink.asExpr() = dmc)will never work for void-typed methods likeprintln, because there is no data-flow method for something ofvoidtype. Setting the argument as the sink instead is the correct approach for something like this. - A good way to report the method called at the final step could be the use of
$@interpolation in the alert message, which allows you to report additional locatable things alongside the result:
from Method finalCallee
...
exists(MethodCall mc | sink.getNode().asExpr() = mc.getAnArgument() | finalCallee = mc.getCallee())
...
select
sink.getNode().getEnclosingCallable(),
source,
sink,
"Dataflow from $@ to $@",
source,
source.toString(),
finalCallee,
finalCallee.toString()
Note that each of the $@ tokens obliges you to append one locatable (source and finalCallee here) followed by one string (used to construct the alert message). In VSCode these will be the target and the text of a hotlink respectively.
Hi @KylerKatz,
Were you able to use @smowton suggestions to improve your query. I didn't follow the entire thread, but based on your initial question, that I interpreted as your intention to generate a path graph of some control flow of interest, the following blog post might be interesting CodeQL Path Graphs.