deeplearning4j icon indicating copy to clipboard operation
deeplearning4j copied to clipboard

Create OSGi bundles for ND4J

Open timothyjward opened this issue 6 years ago • 8 comments

What changes were proposed in this pull request?

As described in #3022 it is necessary for Deeplearning4J to be provided as OSGi bundles in order for it to be used embedded in Eclipse. OSGi has a very different class loader structure which impacts how DL4J loads its backend implementation and its various plugins.

This PR makes the necessary additions/fixes to provide ND4J as OSGi bundles (the first step on the road to running the full DL4J on OSGi. Ideally the existing ND4J jars would have been enhanced with OSGi metadata, but this is not possible because:

  • The ND4J API has split packages between its various modules, these cannot work in separate OSGi bundles
  • The ND4J API is tightly coupled to JavaCPP, which further requires a flat class space to load native libraries

The PR therefore creates an nd4j-osgi bundle which wraps up the "frontend api" modules into a single OSGi bundle JAR. This has a requirement for a "backend". There is also an nd4j-native-osgi bundle providing the backend - this is an OSGi fragment which attaches to the frontend API to provide a working backend.

The overall approach is minimally invasive to the existing ND4J code, there are no API changes and all of the existing jars are still produced as before.

How was this patch tested?

There is an automated OSGi sniff test which has been put in the nd4j-tests-osgi module

Quick checklist

The following checklist helps ensure your PR is complete:

  • [x] Eclipse Contributor Agreement signed, and signed commits - see IP Requirements page for details
  • [x] Reviewed the Contributing Guidelines and followed the steps within.
  • [x] Created tests for any significant new code additions.
  • [x] Relevant tests for your changes are passing.

timothyjward avatar Aug 12 '19 10:08 timothyjward

I don't see these changes as "minimally invasive". What prevents us from creating an nd4j-osgi module like the nd4j-uberjar module that doesn't require any changes to the other modules?

saudet avatar Aug 23 '19 01:08 saudet

I don't see these changes as "minimally invasive". What prevents us from creating an nd4j-osgi module like the nd4j-uberjar module that doesn't require any changes to the other modules?

So in this PR:

  • The ND4J repackaging of Jackson is enhanced with OSGi metadata in the manifest, but there are no other changes
  • There are some internal ND4J changes to allow ServiceLoader to find implementations of things when running in OSGi, vanilla Java usage is unaffected
  • There is a new ND4J "frontend" Maven Module which packages up the ND4J core, common, context and buffer
  • There is a new ND4J "native backend" Maven Module which packages up the native-api and native libraries for the CPU backend
  • There is a new ND4J "OSGi test" project which runs tests in the OSGi environment

Of these the only ones which affect existing modules are the internal ServiceLoader changes and Jackson packaging changes, both of which are necessary if ND4J is to work in OSGi without forcing a choice of CPU/GPU at build time.

timothyjward avatar Aug 23 '19 09:08 timothyjward

Both backends can be built in the same uber JAR, that's not an issue.

If we need to add a dependency on OSGi like that, we'll need to make sure it doesn't break anything anywhere, on Android and Spark among things... Is this something you would have the time to test? :)

saudet avatar Aug 23 '19 11:08 saudet

If we need to add a dependency on OSGi like that, we'll need to make sure it doesn't break anything anywhere, on Android and Spark among things... Is this something you would have the time to test? :)

I'm happy to help, but I'm not sure I have the appropriate infrastructure available. Unless of course you have solved this problem already?

timothyjward avatar Aug 23 '19 12:08 timothyjward

@timothyjward No, not really unfortunately. It's pretty manual running examples from dl4j-examples for most such integration tests.

saudet avatar Aug 25 '19 01:08 saudet

@timothyjward @saudet with java 17 LTS forcing module-info I added a semi related PR: https://github.com/eclipse/deeplearning4j/pull/9626

I think I've at least solved part of the problem for modularization. I'm playing with how to add a meta backend that allows 1 or more backends to be present. We also have classloader overriding now. It might allow us to introduce a better way of introducing OSGI support. Any suggestions even if you dont' have time to work on this?

agibsonccc avatar Feb 07 '22 10:02 agibsonccc

@timothyjward @saudet with java 17 LTS forcing module-info I added a semi related PR: #9626

I think I've at least solved part of the problem for modularization. I'm playing with how to add a meta backend that allows 1 or more backends to be present. We also have classloader overriding now. It might allow us to introduce a better way of introducing OSGI support. Any suggestions even if you dont' have time to work on this?

The biggest issues for OSGi were:

  1. Packages split across multiple jar files.
  2. Highly coupled API packages (API packages recieving/returning types from other packages)
  3. Expectations that you could find implementation classes from the API jar classloader

I'm guessing that item 1 has been solved now as JPMS has a similar requirement. Item 2 has hopefully improved, but isn't necessarily a deal breaker. Item 3 is a big deal, but depending on what the "classloader overriding" support you've added offers it may be sufficient.

I'd suggest adding the bnd-maven-plugin to your build so that you start creating OSGi metadata for each of your jar files. You could then add an OSGi "integration test" project using the bnd-testing-maven-plugin to try some simple scenarios. This will probably not work at first, but will help to figure out:

  1. Problems with resolving (i.e. missing dependencies, unexported API packages)
  2. Problems at runtime (typically expectations of a flat classpath)

These issues can then be worked through one at a time as they are identified until the basic OSGi tests start passing. My guess is that there probably won't be many issues left if the split packages are already gone.

timothyjward avatar Feb 07 '22 11:02 timothyjward

@timothyjward thanks for the comments that helps a lot. I'll try to address what I can in the upgrade.

Regarding 2 and 3:

  1. When splitting things up one problem definitely was the packages things were in. That could stand to be cleaned up. I might do that after releasing the 1.0. For now I'll probably just leave the packages mostly intact and just changing what we need. I agree this should be revisited.

  2. We've added the ability to override everything now with ND4JClassLoading/DL4JClassloading I think with proper injection those issues are mostly solved now.

I'll use this PR for reference and see what I can do about adding proper OSGI metadata. Thanks a lot for your comments.

agibsonccc avatar Feb 07 '22 12:02 agibsonccc