java icon indicating copy to clipboard operation
java copied to clipboard

Code changes to add PluggableDevice support for Inference

Open rjauhari2 opened this issue 10 months ago • 7 comments

Enable TF_LoadPluggableDeviceLibrary API for PluggableDevices

rjauhari2 avatar Mar 07 '25 12:03 rjauhari2

Adding this line should have generated more code which needs to be checked in, and we've had issues with the experimental C API in the past. Does this build correctly?

Craigacp avatar Mar 07 '25 17:03 Craigacp

Hi @Craigacp, I have added the code generated form c_api_experimental.h file. Yes, I was able to build it successfully. I have also tested loading our plugin solution, and tried running few models.

rjauhari2 avatar Mar 11 '25 06:03 rjauhari2

I also recall trying to enable this experimental API in the past and we were facing compilation issues, maybe it has been cleaned up? But if it now works, that's amazing! I'll run a full build on all supported platforms before merging

karllessard avatar Mar 14 '25 11:03 karllessard

~The standard (quick) build is failing. It looks like the symbols that are coming from the experimental layer are not present in the TensorFlow binaries we compile with (tensorflow_cc).~

No wait I was wrong, what is missing is the generated classes under this package. @rjauhari2 , can you please commit them as well? It complains about three missing classes: TF_AttrBuilder, TF_ShapeAndTypeList and TF_CheckpointReader

Also, what command did you run to compile it successfully and what's your platform?

karllessard avatar Mar 14 '25 12:03 karllessard

Hi @karllessard, I used "mvn clean install -Pgenerating" command to build TF-Java. I have tested this on "linux-x86_64" machine. Now, I have added the generated class in "c_api" folder. With these changes I am able to build with "mvn clean install". Thanks for your review and guidance.

rjauhari2 avatar Mar 17 '25 05:03 rjauhari2

Thanks a lot @rjauhari2. Unfortunately, the build failed on Windows, similar to what we've been observing in the past. We'll have to debug this on our side (unless you have access to a Windows machine), so we might want to push changes to your branch. To be continued...

karllessard avatar Mar 19 '25 13:03 karllessard

Thanks for your quick response. I am able to reproduce the error in your CI (GitHub actions). We shall try from our side.

rjauhari2 avatar Mar 24 '25 11:03 rjauhari2

Hi @Craigacp, @karllessard i have added commit for windows build fix. Can you trigger your CI runs again to validate this PR?

rjauhari2 avatar May 27 '25 04:05 rjauhari2

Please move that stub out into a separate file, we don't want to mix up the different stubs.

Craigacp avatar May 27 '25 13:05 Craigacp

Hi @Craigacp, i have moved Windows stub into a separate file "tfe_serverdef_stub.cc". Kindly review.

rjauhari2 avatar May 28 '25 09:05 rjauhari2

And create a separate header file rather than putting the function definition in an existing unrelated file.

Craigacp avatar May 28 '25 15:05 Craigacp

Hi @Craigacp, I have added the header file and included it in the "tensorflow.java" file.

rjauhari2 avatar May 29 '25 06:05 rjauhari2

Hi @Craigacp and @karllessard, Thank you very much for all your comments, and I have updated the PR based on your comments. I request you to help me on the further steps to get this PR merged. Our solution with TensorFlow is based on pluggabledevice. This PR will enable our plugin to seamlessly work with TensorFlow-Java. Thank you!

rjauhari2 avatar Jun 06 '25 04:06 rjauhari2