appscope icon indicating copy to clipboard operation
appscope copied to clipboard

Attach to Java process missing Byte Code Instrumentation

Open michalbiesek opened this issue 4 years ago • 10 comments

Currently, when scoping the Java process we use initJavaAgent only when using the LD_PRELOAD mechanism. initJavaAgent starts interaction with JVM for Byte Code Instrumentation - see Agent_OnLoad method. With Byte Code Instrumentation we are able to scope HTTPS events from Java.
The Byte Code Instrumentation should be done when using scope attach.

The starting point:

  • investigation for Agent_OnAttach interface
  • add mechanism to connect to existing JVM and trigger agent during live phase see https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html Draft of the idea is here: https://github.com/michalbiesek/appscope/draft-Agent-Runner - where attach to live JVM is provided by separate JAR file: Use Java 9 Docker image:
cd test/testContainers
make java9-shell
/opt/tomcat/bin/catalina.sh run &
cd /opt/javassl
java -cp .:$JAVA_HOME/lib/tools.jar AgentRunner `pidof java`
curl -k https://localhost:8443
cat /opt/test-runner/logs/events.log

michalbiesek avatar Sep 30 '21 15:09 michalbiesek

The main idea is to inform the existing JVM - the one which we can attach to be aware of Agent_OnAttach interface.

[DRAFT] Java process Byte code instrumentation on attach

- the main goal is to support Byte Code Instrumentation for existing/running
  Java process
- Agent_OnAttach called from JVM method can use same logic as Agent_OnLoad
- what we need to do is to trigger events/inform JVM that libscope.so is a
  java agent, like it is done for LD_PRELOAD
  "set JAVA_TOOL_OPTIONS so that JVM can load libscope.so as a java agent"
- it can be done with via Java layer as AgentRunner.java - need to figure it out
  how we can do this using JNI - for "loadAgentPath" to libscope.so

See current state of work on: https://github.com/criblio/appscope/tree/draft-Agent-Runner

michalbiesek avatar Oct 08 '21 15:10 michalbiesek

Current status:

  • To force Java Virtual Machine to call Agent_OnAttach interface - a native implementation of loadAgentPath was provided in scope - see details #617
  • Need to investigate the root cause of segfaults when attach to JVM depending on JVM version

michalbiesek avatar Oct 20 '21 08:10 michalbiesek

Current status: During working on this issue after checking different versions of Java I discover a crash when attaching on Java with SIGINT on the Dockerfile.glibc, this will be addressed in #619. At first glance, the segfault observed on the current state of Pull Requests #619 and #617 is related to doGotcha functionality and incorrect behaviour of restore write permissions.

michalbiesek avatar Oct 21 '21 16:10 michalbiesek

Current status: #619 address the bug with the corner case of handling Shared Object library and GOT entry:

7f4bc9a37000-7f4bc9a3a000 ---p 00000000 00:00 0
7f4bc9a3a000-7f4bc9b38000 rw-p 00000000 00:00 0

In current implementation of osGetPageProt we identify 7f4bc9a3a000 address as the one that belong to 7f4bc9a37000-7f4bc9a3a000 memory range. From this range we read permissions (which is no read, no write, no executable). Here we detect that permissions don't have write access: https://github.com/criblio/appscope/blob/26a82a53dffeab42ee282135ea6d8fb467b20e06/src/scopeelf.c#L199-L204 So we add write permissions. In the end, we will restore the permissions, which we read in the beginning: https://github.com/criblio/appscope/blob/26a82a53dffeab42ee282135ea6d8fb467b20e06/src/scopeelf.c#L229-L234 This will result with remove rw permission from the address starting from 7f4bc9a3a000 Then when the program will run and try to access got entry from which we revoke permission it will segfault.

The last commit in #619 addresses the previously described problem.

michalbiesek avatar Oct 22 '21 13:10 michalbiesek

Regarding https://github.com/criblio/appscope/pull/617

  • native implementation of loadAgentPath seems to works fine
  • the discovered limitation is that we need to modify logic of callback, with AgentOnLoad we will not miss any Java class load operation but in the case of attach Java can be already loaded in JVM, in other words:
/opt/tomcat/bin/catalina.sh run &
/opt/appscope/bin/linux/scope attach java
curl -k https://localhost:8443

works fine

/opt/tomcat/bin/catalina.sh run &
curl -k https://localhost:8443
/opt/appscope/bin/linux/scope attach java
curl -k https://localhost:8443

Don't work correctly, since the class is already loaded. Next steps is to investigate JNI tool interface to overcome this limitations

michalbiesek avatar Oct 22 '21 13:10 michalbiesek

With Agent_On_Attach we can use GetLoadedClasses to reiterate over expected classes and force JVM to call ClassFileLoadHook unfortunately RetransformClasses got following limitations: "The retransformation must not add, remove or rename fields or methods" so we cannot use a current mechanism based on javaCopyMethod. I was trying to use the DefineClass to copy the existing class with new name I received an JVMTI_ERROR_NAMES_DONT_MATCH from https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#RetransformClasses

michalbiesek avatar Oct 25 '21 12:10 michalbiesek

Current status: The Agent_On_Attach method works fine for classes that are added after attaching process:

/opt/tomcat/bin/catalina.sh run &
/opt/appscope/bin/linux/scope attach java
curl -k https://localhost:8443  <<-- this will load "sun/nio/ch/SocketChannelImpl" class

What doesn't work is following scenario:

/opt/tomcat/bin/catalina.sh run &
curl -k https://localhost:8443  <<-- this will load "sun/nio/ch/SocketChannelImpl" class
/opt/appscope/bin/linux/scope attach java <<-- this will do copy of "sun/nio/ch/SocketChannelImpl" class
curl -k https://localhost:8443 

Calling the methods from the copy of the class fails.

Summary:

  • Agent_On_Attach correctly register the ClassFileLoadHook
  • After attaching ClassFileLoadHook behaves correctly for freshly loaded class
  • After attaching ClassFileLoadHook don't behave correctly for already loaded classes. Copying class using(*jni)->DefineClass finish with success status but using the methods from Copied class doesn't seems to work - need to investigate it further

michalbiesek avatar Oct 28 '21 12:10 michalbiesek

Status:

I worked on verifying if the Copying class is possible. I focus on a more simple case - created the Test class which only contains 2 methods - see last commit in #617 for details and added a mechanism to Scope to copy the class and manipulate print method to be a native one.

The behavior when we want to call the original print method (not native one):

  • Using the print method loaded from Copied class (which shouldn't be native) results with stack overflow: when we call original print method we result with calling the modified one - native variant
  • Using the second_print method loaded from Copied class results with correct behavior we call the second_print implementation

Other:

After copying the class when we additionally call javaCopyMethod(classInfo, classInfo->methods[methodIndex], "__print"); and try to using the print method loaded from Copied class (which shouldn't be native) we results with calling second_print implementation

Next step: Verify the method indexes and java class structure to see if additional action is required during the copy class mechanism - possibly we referred to old class code - so the copying class mechanism must be adjusted.

michalbiesek avatar Oct 29 '21 15:10 michalbiesek

Status:

I worked on verifying if the following mechanism works fine in the case of Agent_OnLoad (before any Java libraries are loaded) and Agent_OnAttach (after some/all Java libraries are loaded):

  • start a copy of the original class_name_foofoo class - defineCopyClass
  • [Copied class] add copy of bar_method (__barmethod) - javaCopyMethod
  • [Copied class] change name of the class_name_foofoo to class_name_foo__ - javaModifyUtf8String
  • [Copied class] define/create a class_name_foo__ class with - (*jni)->DefineClass
  • [Orignal class] change barmethod to native - javaConvertMethodToNative

With this instrumentation, we will be able to intercept native barmethod on which we call:

JNIEXPORT void JNICALL
Java_class_name_foofoo_barmethod(JNIEnv *jni, jobject obj, jstring str)
{
  // perform scope logic 
  // locate the class_name_foo__
  // locate the __barmethod in class_name_foo__
  // call the original method backuped in __barmethod   ## (1)
}

Results:

With Agent_OnLoad code succeed only when we - add copy of bar_method (__barmethod) - javaCopyMethod in original class class_name_foofoo With Agent_OnLoad without adding copy of bar_method (__barmethod) - javaCopyMethod in original class class_name_foofoo we hit SEGV in (1) With Agent_OnAttach we cannot add new methods in the original class class_name_foofoo in (1) we call native variant of method again until we hit stack overflow

michalbiesek avatar Nov 03 '21 12:11 michalbiesek

Status:

The current implementation supports:

  • attaching to Java virtual machine which already running - so we are able to trigger Agent_on_Attach see: details
  • for classes that are not yet loaded we can use current logic with adding new fields and new methods so we are able to use the same logic as for Agent_on_Load

Next steps and limitations

  • we need to modify already loaded classes - but we cannot add new methods/fields during Retransform or Redefine operations
  • copying class to store original method don't work correctly see https://github.com/criblio/appscope/issues/576#issuecomment-959008698
  • We need to have an impact on the already existing instances of loaded classes. The next steps would be to see IterateOverInstancesOfClass and FollowReferences
  • as an alternative, we can look into MethodEntry or MethodExit - Please Note possible limitations with performance when using MethodEntry and MethodExit

michalbiesek avatar Nov 04 '21 08:11 michalbiesek