pljava icon indicating copy to clipboard operation
pljava copied to clipboard

Potential extension: inline java akin to jbang

Open beargiles opened this issue 8 months ago • 1 comments

This issue is a place to hang a few notes on the issues involved in adding support for stored procedures written in Java instead of plpgsql etc. That means the compilation and storage of bytecode will need to be handled by the backend.

I don't expect this to appear anytime soon but thought I should call it out while you're already refactoring the code. If nothing else it's a chance to add notes or possibly even identify places where this functionality would go.

Background

For many languages the user can define a stored procedure using that language. This is easy to implement, at least conceptually, when it's a scripting language. The 'CREATE LANGUAGE' implementation just needs to wire up the inputs, outputs, and appropriate library(ies).

This is much harder to do with a compiled language like java since you want to keep the compiled object (or bytecode) around. There's also some practical issues like a natural desire to use external libraries - anything simple enough to not require an external library should probably be written in pl/pgsql or the like.

Direct approach

If we wanted to bit the bullet anyway:

  • java has provided a "Compiler" class for years, probably since the 1.0, and all it required was the SDK instead of the runtime-only JDK.

  • java added official RPEL support a few versions ago. It's a much more direct analogy to a scripting language. It requires a relatively new version of the JDK and possibly other resources.

  • the 'jbang' application demonstrates how to use real java as a scripting language. It supports import albeit with a slightly modified syntax.

For performance reasons we would want to retain the bytecode. This could be handled by storing the bytecode in an additional table(s) in the existing pljava schema. We have additional options if we have access to an arbitrary Datum but I suspect anything provided to us will need to be text only.

Indirect approach

A second approach is to use a Java-to-WASM compiler and then use the WASM extension for storage and execution. WASM is gaining popularity since it's so fast, supported by browsers, and there are cross-compilers from all of the major languages. I know there's also a WASM extension but I don't know any details about its usage.

That's a lot of unknowns but it may be a better solution - for standalone stored procedures - since it reduces the amount of work that must be done by this extension.

beargiles avatar Jun 06 '25 15:06 beargiles

I love the idea, and this is a big part of the motivation for the PR #399 refactoring in the first place: to make it invitingly straightforward to implement each and every one of those ideas as a PLJavaBasedLanguage of its own, and play with them, separately or together, and see which ones are most pleasing.

This is easy to implement, at least conceptually, when it's a scripting language. The 'CREATE LANGUAGE' implementation just needs to wire up the inputs, outputs, and appropriate library(ies).

This is much harder to do with a compiled language like java since you want to keep the compiled object (or bytecode) around.

Even Java's JSR 223 Scripting API already contemplates scripting languages that involve a compilation step to some intermediate form, and it's not uncommon for other PostgreSQL PLs to work that way already, even if they do not put the intermediate form into persistent storage. PLJavaBasedLanguage already presents PL handling as a staged-programming exercise, where prepare processes a routine's source into a form depending only on its static properties and cached for the lifespan of its RegProcedure, then specialize uses that form to derive a form specialized to a particular call site and cached for the life of that call site, and call is then used for each call made on that call site.

An easy implementation generating bytecode at prepare time so it is cached with the RegProcedure would provide quick signs of life without further effort to have the bytecode in persistent storage, and that would be like what many PLs now do.

jcflack avatar Jun 06 '25 16:06 jcflack