Attach to Java process missing Byte Code Instrumentation
Currently, when scoping the Java process we use initJavaAgent only when using the LD_PRELOAD mechanism.
initJavaAgent starts interaction with JVM for Byte Code Instrumentation - see Agent_OnLoad method. With Byte Code Instrumentation we are able to scope HTTPS events from Java.
The Byte Code Instrumentation should be done when using scope attach.
The starting point:
- investigation for Agent_OnAttach interface
- add mechanism to connect to existing JVM and trigger agent during live phase see https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html Draft of the idea is here: https://github.com/michalbiesek/appscope/draft-Agent-Runner - where attach to live JVM is provided by separate JAR file: Use Java 9 Docker image:
cd test/testContainers
make java9-shell
/opt/tomcat/bin/catalina.sh run &
cd /opt/javassl
java -cp .:$JAVA_HOME/lib/tools.jar AgentRunner `pidof java`
curl -k https://localhost:8443
cat /opt/test-runner/logs/events.log
The main idea is to inform the existing JVM - the one which we can attach to be aware of Agent_OnAttach interface.
[DRAFT] Java process Byte code instrumentation on attach
- the main goal is to support Byte Code Instrumentation for existing/running
Java process
- Agent_OnAttach called from JVM method can use same logic as Agent_OnLoad
- what we need to do is to trigger events/inform JVM that libscope.so is a
java agent, like it is done for LD_PRELOAD
"set JAVA_TOOL_OPTIONS so that JVM can load libscope.so as a java agent"
- it can be done with via Java layer as AgentRunner.java - need to figure it out
how we can do this using JNI - for "loadAgentPath" to libscope.so
See current state of work on: https://github.com/criblio/appscope/tree/draft-Agent-Runner
Current status:
- To force Java Virtual Machine to call Agent_OnAttach interface - a native implementation of loadAgentPath was provided in scope - see details #617
- Need to investigate the root cause of segfaults when attach to JVM depending on JVM version
Current status:
During working on this issue after checking different versions of Java I discover a crash when attaching on Java with SIGINT on the Dockerfile.glibc, this will be addressed in #619.
At first glance, the segfault observed on the current state of Pull Requests #619 and #617 is related to doGotcha functionality and incorrect behaviour of restore write permissions.
Current status: #619 address the bug with the corner case of handling Shared Object library and GOT entry:
7f4bc9a37000-7f4bc9a3a000 ---p 00000000 00:00 0
7f4bc9a3a000-7f4bc9b38000 rw-p 00000000 00:00 0
In current implementation of osGetPageProt we identify 7f4bc9a3a000 address as the one that belong to 7f4bc9a37000-7f4bc9a3a000 memory range.
From this range we read permissions (which is no read, no write, no executable).
Here we detect that permissions don't have write access:
https://github.com/criblio/appscope/blob/26a82a53dffeab42ee282135ea6d8fb467b20e06/src/scopeelf.c#L199-L204
So we add write permissions.
In the end, we will restore the permissions, which we read in the beginning:
https://github.com/criblio/appscope/blob/26a82a53dffeab42ee282135ea6d8fb467b20e06/src/scopeelf.c#L229-L234
This will result with remove rw permission from the address starting from 7f4bc9a3a000
Then when the program will run and try to access got entry from which we revoke permission it will segfault.
The last commit in #619 addresses the previously described problem.
Regarding https://github.com/criblio/appscope/pull/617
- native implementation of loadAgentPath seems to works fine
- the discovered limitation is that we need to modify logic of callback, with AgentOnLoad we will not miss any Java class load operation but in the case of attach Java can be already loaded in JVM, in other words:
/opt/tomcat/bin/catalina.sh run &
/opt/appscope/bin/linux/scope attach java
curl -k https://localhost:8443
works fine
/opt/tomcat/bin/catalina.sh run &
curl -k https://localhost:8443
/opt/appscope/bin/linux/scope attach java
curl -k https://localhost:8443
Don't work correctly, since the class is already loaded. Next steps is to investigate JNI tool interface to overcome this limitations
With Agent_On_Attach we can use GetLoadedClasses to reiterate over expected classes and force JVM to call ClassFileLoadHook unfortunately RetransformClasses got following limitations: "The retransformation must not add, remove or rename fields or methods" so we cannot use a current mechanism based on javaCopyMethod.
I was trying to use the DefineClass to copy the existing class with new name I received an JVMTI_ERROR_NAMES_DONT_MATCH from https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#RetransformClasses
Current status: The Agent_On_Attach method works fine for classes that are added after attaching process:
/opt/tomcat/bin/catalina.sh run &
/opt/appscope/bin/linux/scope attach java
curl -k https://localhost:8443 <<-- this will load "sun/nio/ch/SocketChannelImpl" class
What doesn't work is following scenario:
/opt/tomcat/bin/catalina.sh run &
curl -k https://localhost:8443 <<-- this will load "sun/nio/ch/SocketChannelImpl" class
/opt/appscope/bin/linux/scope attach java <<-- this will do copy of "sun/nio/ch/SocketChannelImpl" class
curl -k https://localhost:8443
Calling the methods from the copy of the class fails.
Summary:
-
Agent_On_Attachcorrectly register theClassFileLoadHook - After attaching
ClassFileLoadHookbehaves correctly for freshly loaded class - After attaching
ClassFileLoadHookdon't behave correctly for already loaded classes. Copying class using(*jni)->DefineClassfinish with success status but using the methods from Copied class doesn't seems to work - need to investigate it further
Status:
I worked on verifying if the Copying class is possible.
I focus on a more simple case - created the Test class which only contains 2 methods - see last commit in #617 for details and added a mechanism to Scope to copy the class and manipulate print method to be a native one.
The behavior when we want to call the original print method (not native one):
- Using the
printmethod loaded from Copied class (which shouldn't be native) results with stack overflow: when we calloriginal printmethod we result with calling the modified one - native variant - Using the
second_printmethod loaded from Copied class results with correct behavior we call thesecond_printimplementation
Other:
After copying the class when we additionally call javaCopyMethod(classInfo, classInfo->methods[methodIndex], "__print"); and try to using the print method loaded from Copied class (which shouldn't be native) we results with calling second_print implementation
Next step: Verify the method indexes and java class structure to see if additional action is required during the copy class mechanism - possibly we referred to old class code - so the copying class mechanism must be adjusted.
Status:
I worked on verifying if the following mechanism works fine in the case of Agent_OnLoad (before any Java libraries are loaded) and Agent_OnAttach (after some/all Java libraries are loaded):
- start a copy of the original
class_name_foofooclass -defineCopyClass - [Copied class] add copy of
bar_method(__barmethod) -javaCopyMethod - [Copied class] change name of the
class_name_foofootoclass_name_foo__-javaModifyUtf8String - [Copied class] define/create a
class_name_foo__class with -(*jni)->DefineClass - [Orignal class] change
barmethodto native -javaConvertMethodToNative
With this instrumentation, we will be able to intercept native barmethod on which we call:
JNIEXPORT void JNICALL
Java_class_name_foofoo_barmethod(JNIEnv *jni, jobject obj, jstring str)
{
// perform scope logic
// locate the class_name_foo__
// locate the __barmethod in class_name_foo__
// call the original method backuped in __barmethod ## (1)
}
Results:
With Agent_OnLoad code succeed only when we - add copy of bar_method (__barmethod) - javaCopyMethod in original class class_name_foofoo
With Agent_OnLoad without adding copy of bar_method (__barmethod) - javaCopyMethod in original class class_name_foofoo we hit SEGV in (1)
With Agent_OnAttach we cannot add new methods in the original class class_name_foofoo in (1) we call native variant of method again until we hit stack overflow
Status:
The current implementation supports:
- attaching to Java virtual machine which already running - so we are able to trigger
Agent_on_Attachsee: details - for classes that are not yet loaded we can use current logic with adding new fields and new methods so we are able to use the same logic as for
Agent_on_Load
Next steps and limitations
- we need to modify already loaded classes - but we cannot add new methods/fields during Retransform or Redefine operations
- copying class to store original method don't work correctly see https://github.com/criblio/appscope/issues/576#issuecomment-959008698
- We need to have an impact on the already existing instances of loaded classes. The next steps would be to see IterateOverInstancesOfClass and FollowReferences
- as an alternative, we can look into MethodEntry or MethodExit - Please Note possible limitations with performance when using
MethodEntryandMethodExit