pljava icon indicating copy to clipboard operation
pljava copied to clipboard

Work-in-Progress on PL/Java refactoring, API modernization

Open jcflack opened this issue 4 years ago • 12 comments

As a work-in-progress pull request, this is not expected to be imminently merged, but is here to document the objectives and progress of the ongoing work.

Why needed

A great advantage promised by a PL based on the JVM is the large ecosystem of languages other than Java that can be supported on the same infrastructure, whether through the Java Scripting (JSR 223) API, or through the polyglot facilities of GraalVM, or simply via separate compilation to the class file format and loading as jars.

However, PL/Java, with its origins in 2004 predating most of those developments, has architectural limitations that stand in the way.

JDBC

One of the limitations is the centrality of the JDBC API. To be sure, it is a standard in the Java world for access to a database, and for PL/Java to conform to ISO SQL/JRT, the JDBC API must be available. But it is not necessarily a preferred or natural database API for other JVM or GraalVM languages, and its design goal is to abstract away from the specifics of an underlying database, which ends up complicating or even preventing access to advanced PostgreSQL capabilities that could be prime drivers for running server-side code in the first place.

The problem is not that JDBC is an available API in PL/Java, but that it is the fundamental API in PL/Java, with its tentacles reaching right into the native C language portion of PL/Java's implementation. That has made alternative interface options impractical, and multiplied the maintenance burden of even simple tasks like adding support for new datatype mappings or fixing simple bugs. There are significant portions of JDBC 4 that remain unimplemented in PL/Java.

Experience building an implementation of ISO SQL/XML XMLQUERY showed that certain requirements of the spec were simply unsatisfiable atop JDBC, either because of inherent JDBC limitations or limits in PL/Java's implementation of it. An example of each kind:

  • The INTERVAL data type cannot be mapped as SQL/XML requires, because the only ResultSetMetadata methods JDBC defines for access to a type modifier are precision and scale, which apply to numeric values; the API defines no standard way to learn what the modifier of an INTERVAL says about whether months or days are present.
  • The DECIMAL type cannot be mapped as SQL/XML requires; for that case, the fault is not with JDBC (which defines the precision and scale methods), but with their incomplete implementation in PL/Java.

Those cases also illustrate that mapping some PostgreSQL data types to those of another language can be complex. An arbitrary PostgreSQL INTERVAL is representable as neither a java.time.Period nor a java.time.Duration alone (though a pair of the two can be used, a type that PGJDBC-NG offers). One or the other can suffice if the type modifier is known and limits the fields present. A PostgreSQL NUMERIC value has not-a-number and signed infinity values that some candidate language-library type might not, and an internal precision that its text representation does not reveal, which might need to be preserved for a mathematically demanding task. The details of converting it to another language's similar type need to be knowable or controllable by an application.

It is a goal of this work to give PL/Java an API that does not obscure or abstract from PostgreSQL details, but makes them accessible in a natural Java idiom, and that such a "natural PostgreSQL" API should be adequate to allow building a JDBC layer in pure Java above it. (The work of building such a JDBC layer is not in the scope of this pull request.)

Parameter and return-value mapping

PL/Java uses a simple, Java-centric approach where a Java method is declared naturally, giving ordinary Java types for its parameters and return, and the mappings from these to the PostgreSQL parameter and return types are chosen by PL/Java and applied transparently (and much of that happens deep in PL/Java's C code).

While convenient, that approach isn't easily adapted to other JVM languages that may offer other selections of types. Even for Java, it stands in the way of doing certain things possible in PostgreSQL, like declaring VARIADIC "any" functions.

In a modernized API, it needs to be possible to declare a function whose parameter represents the PostgreSQL FunctionCallInfo, so that the parameters and their types can be examined and converted in Java. That will make it possible to write language handlers in Java, whether for other JVM languages or for the existing PL/Java calling conventions that at present are tangled in C.

Elements of new API

Identification of data types

A PostgreSQL-specific API must be able to refer unambiguously to any type known to the database, so it cannot rely on any fixed set of generic types such as JDBCType. To interoperate with a JDBC layer, though, the identifier for types should implement JDBC's SQLType interface.

The API should support retrieving enough metadata about the type for a JDBC layer implemented above it to be able to report complete ResultSetMetaData information.

The new class serving this purpose is RegType.

As RegType implements the java.sql.SQLType interface, an aliasing issue arises for a JDBC layer. Such a layer should accept JDBCType.VARCHAR as an alias for RegType.VARCHAR, for example. JDBC itself has no methods that return an SQLType instance, so the question of whether it should return the generic JDBC type or the true RegType does not arise. A PL/Java-specific API is needed for retrieving the type identifier in any case.

The details of which JDBC types are considered aliases of which RegTypes will naturally belong in a JDBC API layer. At the level of this underlying API, a RegType is what identifies a PostgreSQL type.

While RegType includes convenience final fields for a number of common types, those by no means limit the RegTypes available. There is a RegType that can be obtained for every type known to the database, whether built in, extension-supplied, or user-defined.

Other PostgreSQL catalog objects and key abstractions

RegType is one among the types of PostgreSQL catalog objects modeled in the org.postgresql.pljava.model package.

Along with a number of catalog object types, the package also contains:

  • TupleDescriptor and TupleTableSlot, the key abstractions for fetching and storing database values. TupleTableSlot in PostgreSQL is already a useful abstraction over a few different representations; in PL/Java it is further abstracted, and can present with the same API other collections of typed, possibly named, items, such as arrays, the arguments in a function call, etc.
  • MemoryContext and ResourceOwner, both subtypes of Lifespan, usable to guard Java objects that have native state whose validity is bounded in time
  • CharsetEncoding

Mapping PostgreSQL data types to what a PL supports

The Adapter class

A mapping between a PostgreSQL data type and a suitable PL data type is an instance of the Adapter class, and more specifically of the reference-returning Adapter.As<T,U> or one of the primitive-returning Adapter.AsInt<U>, Adapter.AsFloat<U>, and so on (one for each Java primitive type). The Java type produced is T for the As case, and implicit in the class name for the AsFoo cases.

The basic method for fetching a value from a TupleTableSlot is get(Attribute att, Adapter adp), and naturally is overloaded and generic so that get with an As<T,?> adapter returns a T, get with an AsInt<?> adapter returns an int, and so on. (But see this later comment below for a better API than this item-at-a-time stuff.) (The U type parameter of an adapter plays a role when adapters are combined by composition, as discussed below, and is otherwise usually uninteresting to client code, which may wildcard it, as seen above.)

A manager class for adapters

Natural use of this idiom presumes there will be some adapter-manager API that allows client code to request an adapter for some PostgreSQL type by specifying a Java witness class Class<T> or some form of super type token, and returns the adapter with the expected compile-time parameterized type.

That manager hasn't been built yet, but the requirements are straightforward and no thorny bits are foreseen. (Within the org.postgresql.pljava.internal module itself, things are simpler; no manager is needed, and code refers directly to static final INSTANCE fields of existing adapters.)

Extensibility

PL/Java has historically supported user-defined types implemented in Java, a special class of data types whose Java representations must implement a certain JDBC interface and import and export values through a matching JDBC API. In contrast, PL/Java's first-class PostgreSQL data type support—the mappings it supplies between PostgreSQL and ordinary Java types that don't involve the specialized JDBC user-defined type APIs—has been hardcoded in C using Java Native Interface (JNI) calls, and not straightforward to extend. That's a pain point for several situations:

  • A mapping for another PostgreSQL data type (either a type newly added to PostgreSQL, or simply one that PL/Java does not yet have a mapping for) is not easily added for an application that needs it, but generally must be added in PL/Java's C/JNI internals and made available in a new PL/Java build.
  • A mapping of an existing PostgreSQL data type to a new or different Java type—same story. When Java 8 introduced the java.time package, developers wishing to have PL/Java map PostgreSQL's date and time types to the improved Java types instead of the older java.sql ones had to open issues requesting that ability and wait for a PL/Java release to include it.
  • Not every PostgreSQL data type has a single best PL type to be mapped to. One application using the geometric types might want them mapped to the Java types in the PGJDBC library, while another might prefer the 2D classes supplied by some Java geometry library. One application might want a PostgreSQL array mapped to a flat Java List, another to a multi-dimensioned Java array, another to a matrix class from a scientific computation library. The choices multiply when considering the data types not only of Java but of other JVM languages. C coding and rebuilding of PL/Java should not be needed to tailor these mappings.

Adapters implementable in pure Java

With this PR, code external to PL/Java's implementation can supply adapters, built against the service-provider API exposed in org.postgresql.pljava.adt.spi.

Leaf adapters

A "leaf" adapter is one that directly knows the PostgreSQL datum format of its data type, and maps that to a suitable PL type. Only a leaf adapter gets access to PostgreSQL datums, which it should not leak to other code. Code that defines leaf adapters must be granted a permission in pljava.policy.

Composing adapters

A composing, or non-leaf, adapter is one meant to be composed over another adapter. An example would be an adapter that composes over an adapter returning type T (possibly null) to form an adapter returning Optional<T>. With a selection of common composing adapters (there aren't any in this pull request, yet), it isn't necessary to provide leaf adapters covering all the ways application code might want data to be presented. No special permission is needed to create a composing adapter.

Java's generic types are erased to raw types for runtime, but the Java compiler saves the parameter information for runtime access through Java reflection. As adapters are composed, the Adapter class tracks the type relationships so that, for example, an Adapter<Optional<T>,T> composed over an Adapter<String,Void> is known to produce Optional<String>.

It is that information that will allow an adapter manager to satisfy a request to map a given PostgreSQL type to some PL type, by finding and composing available adapters.

Contract-based adapters

For a PostgreSQL data type that doesn't have one obvious best mapping to a PL type (perhaps because there are multiple choices with different advantages, or because there is no suitable type in the PL's base library, and any application will want the type mapped to something in a chosen third-party library), a contract-based adapter may be best. An Adapter.Contract is a functional interface with parameters that define the semantically-important components of the PostgreSQL type, and a generic return type, so an implementation can return any desired representation for the type.

A contract-based adapter is a leaf adapter class with a constructor that accepts a Contract, producing an adapter between the PostgreSQL type and whatever PL type the contract maps it to. The adapter encapsulates the internal details of how a PostgreSQL datum encodes the value, and the contract exposes the semantic details needed to faithfully map the type. Contracts for many existing PostgreSQL types are provided in the org.postgresql.pljava.adt package.

ArrayAdapter

The one supplied ArrayAdapter is contract-based. While a Contract.Array has a single abstract method, and therefore could serve as a functional interface, in practice it is not directly implementable by a lambda; there must be a subclass or subinterface (possibly anonymous) whose type parameterization the Java compiler can record. (A lambda may then be used to instantiate that.) An instance of ArrayAdapter is constructed by supplying an adapter for the array's element type along with an array contract targeting some kind of collection of the mapped type. As with a composing adapter, the Adapter class substitutes the element adapter's target Java type through the type parameters of the array contract, to arrive at the actual parameterized type of the resulting array or collection.

PostgreSQL arrays can be multidimensional, and are regular (not "jagged"; all sub-arrays at a given dimension match in size). They can have null elements, which are tracked in a bitmap, offering a simple way to save some space for arrays that are sparse; there are no other, more specialized sparse-array provisions.

Array indices need not be 0- or 1-based; the base index as well as the index range can be given independently for each dimension. PostgreSQL creates 1-based arrays by default. This information is stored with the array value, not with the array type, so a column declared with an array type could conceivably have values of different cardinalities or even dimensionalities.

The adapter is contract-based because there are many ways application code could want a PostgreSQL array to be presented: as a List or single Java array (flattening multiple dimensions, if present, to one, and disregarding the base index), as a Java array-of-arrays, as a JDBC Array object (which does not officially contemplate more than one array dimension, but PostgreSQL's JDBC drivers have used it to represent multidimensioned arrays), as the matrix type offered by some scientific computation library, and so on.

For now, one predefined contract is supplied, AsFlatList, and a static method, nullsIncludedCopy, that can be used (via method reference) as one implementation of that contract.

Java array-of-arrays

While perhaps not an extremely efficient way to represent multidimensional arrays, the Java array-of-arrays approach is familiar, and benefits from a bit of dedicated support for it in Adapter. Therefore, if you have an Adapter a that renders a PostgreSQL type Foo as Java type Bar, you can use, for example, a.a2().build() to obtain an Adapter from the PostgreSQL array type Foo[] to the Java type Bar[][], requiring the PostgreSQL array to have two dimensions, allowing each value to have different sizes along those dimensions, but disregarding the PostgreSQL array's start indices (all Java arrays start at 0).

Because PostgreSQL stores the dimension information with each value and does not enforce it for a column as a whole, it could be possible for a column of array values to include values with other numbers of dimensions, which an adapter constructed this way will reject. On the other hand, the sizes along each dimension are also allowed by PostgreSQL to vary from one value to the next, and this adapter accommodates that, as long as the number of dimensions doesn't change.

The existing contract-based ArrayAdapter is used behind the scenes, but build() takes care of generating the contract. Examples are provided.

Adapter maintainability

Providing pure-Java adapters that know the internal layouts of PostgreSQL data types, without relying on JNI calls and the PostgreSQL native support routines, entails a parallel-implementation maintenance responsibility roughly comparable to that of PostgreSQL client drivers that support binary send and receive. (The risk is slightly higher because the backend internal layouts are less committed than the send/receive representations. Because they are used for data on disk, though, historically they have not changed often or capriciously.)

The engineering judgment is that the resulting burden will be manageable, and the benefits in clarity and maintainability of the pure-Java implementations, compared to the brittle legacy Java+C+JNI approach, will predominate. The process of developing clear contracts for PostgreSQL types already has led to discovery of one bug (#390) that could be fixed in the legacy conversions.

For the adapters supplied in the org.postgresql.pljava.internal module, it is possible to use ModelConstants.java/ModelConstants.c to ensure that key constants (offsets, flags, etc.) stay synchronized with their counterparts in the PostgreSQL C code.

Adapter is a class in the API module, with the express intent that other adapters can be developed, and found by the adapter manager through a ServiceLoader API, without being internal to PL/Java. Those might not have the same opportunity for build-time checking against PostgreSQL header files, and will have to rely more heavily on regression tests for key data values, much as binary-supporting client drivers must. The same can be true even for PL/Java internal adapters for a few PostgreSQL data types whose C implementations are so strongly encapsulated (numeric comes to mind) that necessary layouts and constants do not appear in .h files.

Known open items

In no well-defined order ....

  • [ ] The to-PostgreSQL direction for Adapter, TupleTableSlot, and Datum.Accessor. These all have API and implementation for getting PostgreSQL values and presenting them in Java. Now the other direction is needed.
  • [ ] Provide API and implementation for a unified list-of-slots representation for a variety of list-of-tuple representations used in PostgreSQL, by:
    • [x] factoring out the list-of-TupleTableSlot classes currently found as preliminary scaffolding in TupleTableSlot.java
    • [x] providing such a representation for SPITupleTable ...
    • [ ] CatCList ...
    • [ ] Tuplestore? ...
    • [ ] ...?
  • [ ] Implement some form of offset memoization so fetching attributes from a heap TupleTableSlot stays subquadratic
  • [ ] Finish the unimplemented grants methods of RegRole and the unmplemented unary one of CatalogObject.AccessControlled. (Needs the CatCList support, for pg_auth_members searches.)
  • [x] A NullableDatum flavor of TupleTableSlot. One of the last prerequisites to enable pure-Java language-handler implementations, to which the function arguments will appear as a TupleTableSlot.
  • [ ] Complete the implementation of isSubtype with the rules from Java Language Specification 4.10. (At present it is a stub that only checks erased subtyping, enough to get things initially going.)
  • [ ] The adapter manager described above. (Requires isSubtype.)
  • [ ] Adapters for PostgreSQL types that don't have them yet (starting, perhaps, with the ones that already have contracts defined in org.postgresql.pljava.adt).
  • [ ] TextAdapter does not yet support the type modifiers for CHAR and VARCHAR. It needs a contract-based flavor that does.
  • [ ] ArrayAdapter (or Contract.Array) should supply at least one convenience method, taking a dimsAndBounds array parameter and generating an indexing function (a MethodHandle?) that has nDims integer parameters and returns an integer flat index. Other related operations? An index enumerator, etc.?
  • [ ] A useful initial set of composing adapters, such as:
    • [ ] one of the form As<Optional<T>,T>
      • [x] implement in an example class
      • [ ] integrate into PL/Java proper
    • [ ] one extending As<T,T> that returns null for null and values unchanged
      • why? because with adapter autoboxing, it can be composed over any primitive-returning adapter to enable it to handle null, by returning its boxed form
      • [x] implement in an example class
      • [ ] integrate into PL/Java proper
    • [ ] a set composing over primitive adapters to use a specified value in the primitive's value space to represent null.
      • [x] implement in an example class
      • [ ] complete the set and integrate into PL/Java proper
  • [ ] More work on CatalogObject invalidation. RegClass and RegType are already invalidated selectively; probably RegProcedure should be also. PostgreSQL has a limited number of callback slots, so it would be antisocial to grab them for all the supported classes: less critical ones just depend on the global switchpoint; come up with a good story for invalidating those. Also for how TupleDescriptor should behave upon invalidation of its RegClass. See commit comments for 5adf2c8.
  • [ ] Better define and implement the DualState behavior of TupleTableSlot.
  • [ ] Reduce the C-centricity of VarlenaWrapper. Goal: DatumUtils.mapVarlena doing more in Java, less in C.
    • [x] more of VarlenaWrapper's functionality moved to DatumImpl
    • [x] client code no longer casting Datum.Input to VarlenaWrapper to use it.
  • [ ] Adapter should have control over the park/fetch/decompress/lifespan decisions for VarlenaWrapper; currently the behavior is hardcoded for top-transaction lifespan, lazy detoasting, appropriate for SQLXML, which was the first VarlenaWrapper client.
  • [ ] Add MBeans with statistics for the new caches

And then

  • [ ] Choose some interesting JVM language foo and implement a simple PL/foo in pure Java, using these facilities.
  • [ ] Reimplement PL/Java's own language handler the same way.

jcflack avatar Jan 24 '22 02:01 jcflack

Hi @jcflack,

I found the "PL/Java refactoring for support of other JVM and polyglot languages" project while browsing PostgreSQL GSoC projects for this year, and I think it sounds really interesting. I wanted to check and see if PL/Java is still under active development - the last commit was pushed last year - and if the GSoC project would still be relevant.

Thanks!

HoussemNasri avatar Feb 25 '23 13:02 HoussemNasri

Hello @jcflack, I found this while going through Postgres' GSoC 2023 projects list. I consider myself adept with Java 11+ and a bit of Scala 2.12. Though I haven't really used Postgres in professional capacity or otherwise, I'm interested in this project as I'm primarily a Systems person at heart with an inclination towards Compilers, Runtimes, PL Theory etc. I am not familiar with JNI as such and need to brush up on C, but would first start with building the project and going through the codebase. Any suggestions from your end? Looking forward towards contributing something tangible!!

Warm Regards, Divyaank

divyaankt avatar Feb 26 '23 10:02 divyaankt

Hello @HoussemNasri and @divyaankt,

Thank you for your interest in the project! Yes, the GSoC project is still relevant. It didn't attract any applicants last year, and so I used the time to step a bit away from PL/Java and focus on some other things for the summer, which turned into the summer and fall (and part of winter). But I am more than happy to return to PL/Java.

jcflack avatar Feb 26 '23 15:02 jcflack

What a wonderful piece of technical literature.

I'm wondering if a tool that compiles non-Java JVM code to Java (under the hood) and then uses PL/SQL could be a candidate solution since I am guessing this doesn't disrupt our JBDC centrality.

I have a bunch of other questions, but I'm just trying to grok the current sentiment going forward so I can work towards it.

AbhinavMir avatar Mar 10 '23 08:03 AbhinavMir

The SlotTester.test method that was hastily provided earlier, as a way of testing this new API, was rather limiting. It could only accept a String query with no parameters, causing flashbacks to the bad old days of SQL thrown together with string concatenation.

In the current state of this branch, the new API is incomplete and read-only, and the old legacy JDBC implementation is still around, so the obvious interim solution is to bridge the two, allowing JDBC Statement or PreparedStatement for issuing a query, and now SlotTester.unwrapAsPortal to present the JDBC ResultSet as a Portal object, and proceed to retrieve the results using new API.

The basic method for fetching a value from a TupleTableSlot is get(Attribute att, Adapter adp), and naturally is overloaded and generic so that get with an As<T,?> adapter returns a T, get with an AsInt<?> adapter returns an int, and so on.

That early approach in the new API, for the purpose of early signs of life, was JDBC-like in requiring a method call for each item to retrieve. Beyond the tedium of developing code in that style, it also forecloses many opportunities for efficient implementation, requiring any needed checks on user input (does this attribute belong to this result? is this adapter for a suitable type? native memory region still valid?, and so on) to be repeated for every single-item fetch. It may also be advantageous to arrange the order of fetches with some knowledge of how the tuple deforming is done, for which, again, the implementation has no flexibility with the user code picking at it one item at a time.

The idea was always to supply a multiple-columns-at-once API, which is introduced here.

TargetList, and its subinterface Projection, are used for selecting the attributes of interest. Projection, as in the algebraic usage, does not allow more than one mention of any attribute. The original Projection is the full TupleDescriptor for a result, and another Projection derived from an existing one must have a subset of its attributes, possibly reordered.

TargetList, the superinterface, relaxes the nonrepetition condition; a TargetList is allowed to mention the same attribute more than once. That should sound less efficient than simply mentioning it once and letting the user Java code copy the fetched value around, but there may be cases where it is useful. One would be when the Java code wants different representations of one PG value, produced by different Adapters. Another can be when the Java representation is a one-use-only class like SQLXML.

After shaping a Projection or TargetList to suit just what the Java code wants to retrieve, the TargetList can be applied over a List of TupleTableSlot at a time, using adapters selected for the desired Java types, and a lambda with corresponding parameters, whose types are inferred from the adapters. Functional interfaces of some likely lengths are provided, and can be curried to fit a TargetList with any number and types of columns.

An example illustrates the usage.

The work in this pull request to this point generally still calls the old heavyweight item-at-a-time get methods on TupleTableSlot under the hood, leaving the optimizations suggested above for future work, but it clears the way for those, and the old per-item get methods in their current form should eventually be deprecated.

jcflack avatar Aug 15 '23 02:08 jcflack

Dispatcher for multiple PLs implemented atop PL/Java

I had thought to continue ticking more of the other open-items boxes before doing the dispatcher, but for a change of scenery, here is the new dispatcher.

The first brand-new PL/Java-based procedural language is Glot64. It will probably never grow to rival Python or JavaScript in popularity, either because it can't do anything but write messages to standard output, or because you write your functions/procedures in base 64 :). So, here is a Glot64 function that writes Hello, world! on the server's standard output when called:

CREATE OR REPLACE FUNCTION javatest.hello()
 RETURNS void
 LANGUAGE glot64
AS 'SGVsbG8sIHdvcmxkIQo=';

The impatient may see Hello, world! immediately, using an inline code block:

DO LANGUAGE glot64 'SGVsbG8sIHdvcmxkIQo=';

The output won't be visible at all if the server's standard output is going to /dev/null or the like. But a test instance run in PL/Java's test harness, for example, will have its standard output going to the terminal.

In addition to the base-64-decoded source string, you will see other output from the glot64 language handler, which is really the point, for a demonstration example. The base-64 string is just for fun.

Glot64, like any PL/Java-based language, needs a language handler: namely, a class that implements the PLJavaBasedLanguage interface. Various methods on that interface are used for validating functions/procedures, compiling, specializing, and calling functions/procedures, and executing inline blocks (for a language that supports those).

After installing a jar containing the class that implements the language, use the name of that class to declare a validator function, using the language pljavahandler:

CREATE OR REPLACE FUNCTION javatest.glot64_validator(oid)
 RETURNS void
 LANGUAGE pljavahandler
AS 'org.postgresql.pljava.example.polyglot.Glot64'; -- class name

followed by CREATE LANGUAGE using that new function as the validator, along with PL/Java's existing routine and inline dispatcher functions as the other two handlers:

CREATE LANGUAGE glot64
 HANDLER sqlj.pljavaDispatchRoutine
 INLINE  sqlj.pljavaDispatchInline
 VALIDATOR javatest.glot64_validator;

Bear in mind that the very first still-unticked "open items" box at the top of this pull request is still:

The to-PostgreSQL direction for Adapter, TupleTableSlot, and Datum.Accessor.

and that's why no PL/Java-based function or procedure can return any results yet. That will be done by storing the result value (or values) into the Call.result() TupleTableSlot, and the store direction doesn't work yet. So that's why Glot64 is limited to writing messages on standard output.

On the other hand, fetching from a TupleTableSlot is indeed working already, so a language handler can fetch values from the Call.arguments() TupleTableSlot using whatever Adapter is appropriate to each argument's type. The Glot64 language ignores passed arguments, but that's not a necessary limitation.

Also, of course, all the other unticked boxes in that open-items list are still unticked, so plenty of work remains. But the dispatcher is here, and the PLJavaBasedLanguage interface, enough to begin experimenting with the development of language handlers for languages of interest.

jcflack avatar Oct 09 '23 23:10 jcflack

Multi-PL dispatcher now supports languages implementing triggers

If a PL's implementing class does not implement the Triggers interface, the dispatcher will automatically reject trigger functions (at validation time, or at call time in case such a function got created while validation couldn't happen). The PL implementation does not have to concern itself with that.

If a PL does implement Triggers, the dispatcher will dispatch to the methods of that interface when a trigger is encountered, instead of to the normal Routines methods. The PL handler therefore doesn't have to begin with if ( CALLED_AS_TRIGGER(...) ) boilerplate. The dispatcher has already validated that the trigger function has the proper non-set return type and no arguments.

The PostgreSQL trigger manager prepares for a query by allocating a distinct FmgrInfo structure for each trigger that will be involved. If there is a trigger function that multiple triggers refer to, the specialize method of TriggerTemplate will be called for each of the triggers, and the corresponding result cached for each trigger. The specialize method is therefore free to generate a TriggerFunction that encapsulates information about one specific trigger.

The Glot64 example language handler now contains example trigger code.

jcflack avatar Apr 08 '25 19:04 jcflack

A PL/Java-based language can handle columns/expressions of concrete type anyarray

The PostgreSQL type ANYARRAY, normally a polymorphic type that would only be seen in a routine's inputsTemplate or outputsTemplate prior to resolution at an actual call site, can in very particular circumstances be seen even after resolution, in a routine's inputsDesccriptor or outputsDescriptor. It is not normally possible to declare uses of ANYARRAY as a concrete type, but certain columns in PostgreSQL-supplied statistics-related catalog tables are declared that way, and the type will be seen for those columns or expressions involving them.

Such a column will always hold an array, but different rows may hold arrays of different element types. A method on Adapter.Array, elementType(), will supply an Adapter that produces the element type of an ANYARRAY-typed array. Once the element type is known, a suitable Adapter for that type can be chosen, and used to construct an array adapter for access to the array's content.

Dispatcher now supports languages implementing TRANSFORM FOR TYPE

If a PL's implementing class does not implement the UsingTransforms interface, the dispatcher will automatically reject routine declarations (at validation time, or at call time in case such a routine got created while validation couldn't happen) that include TRANSFORM FOR TYPE. The PL implementation does not have to concern itself with that, and this avoids the case where PostgreSQL allows routine declarations with TRANSFORM FOR TYPE for transform-unaware PLs where they will have no effect.

If a PL does implement UsingTransforms, the dispatcher will make sure that any Transform mentioned in a routine declaration for that language satisfies the language's essentialTransformChecks method. Because the fromSQL and toSQL functions of a transform have similar signatures in SQL to other functions that aren't transform functions at all, and PostgreSQL does not prevent CREATE TRANSFORM naming inappropriate functions, the essentialTransformChecks method should make a diligent effort to ensure that any proposed Transformhas fromSQL/toSQL functions the PL will be able to use.

Implementing that UsingTransforms method is only the start of the PL handler's job. The handler is also responsible for the entirety of whatever that PL will do to accomplish the results of the transforms. It will probably begin, in its prepare method, by consulting the memo's transforms() method to learn what transforms, if any, should be applied.

The Glot64 example language handler now contains example code involving transforms.

jcflack avatar Apr 26 '25 20:04 jcflack

How a PL/Java-based language supports set-returning functions

An implementation of PLJavaBasedLanguage may also implement ReturningSets. If it does, its prepareSRF method, not the usual prepare, will be used when the target RegProcedure's returnsSet() is true. (If a PL/Java-based language does not implement ReturningSets, PL/Java's dispatcher will reject any such RegProcedure at validation and at dispatch time, so a language that does not intend to support set return does not have to concern itself with those details.)

The prepareSRF method must return an SRFTemplate. Further, what it returns must also implement one or more of SRFTemplate's member subinterfaces, ValuePerCall or Materialize (PostgreSQL might add additional options in the future).

The base interface, SRFTemplate, has an abstract negotiate method, to be passed a list of the subinterfaces the caller is prepared to accept, and return the index of one from the list that the routine will use. Each subinterface has a default implementation that will find itself in the caller's list. A class that implements more than one of the subinterfaces will inherit conflicting defaults, and therefore have to provide its own negotiate implementation.

The list of interfaces acceptable to the caller is ordered so as to reflect the caller's preference. A simple negotiate method could return the index of the first interface in the list that this SRFTemplate happens to implement. A more sophisticated one might take properties of the prepared routine into account.

The ValuePerCall interface specifies a specializeValuePerCall method that is expected to return an SRFFirst. SRFFirst has a firstCall method that should return an SRFNext instance. SRFNext has a nextResult method that will be called as many times as necessary. It should use fcinfo.result / fcinfo.isNull to store result values for one row, like any non-set-returning function, but return SINGLE, MULTIPLE, or END to indicate whether another call is expected. SRFNext implements AutoCloseable, and its close() method will be called after the last call made to nextResult (which may happen before nextResult returns END, if PostgreSQL does not need all the results). The case where nextResult returns SINGLE is an exception, treated as returning only that one row, and close() will not be called.

The Materialize interface will likewise specify a specializeMaterialize method, but the details are TBD, so at this stage of the API the Materialize interface is a stub and does not specify any usable behavior yet.

The Glot64 example language handler now has example code for set-returning functions. For as long as the store direction of TupleTableSlot remains unimplemented, null / void / zero are still the only values any Glot64 function can return. But now it can return sets of them!

jcflack avatar May 09 '25 21:05 jcflack

Adding this for the record since I didn't see it mentioned above.

The existing implementation uses JNI. It has many benefits but is a real PITA to work with.

That lead to the creation of 'Java Native Access' (JNA). It is much easier to use since you only need to provide (loosely) the API interface and location of the shared file. It can't do everything that JNI can - but it can do a lot.

This refactoring may want to look at how much of libpq can be implemented using JNA instead of JNI, and whether there would be a performance impact when doing so.

The main benefit of this approach would be 1) reducing the amount of work required by this project and 2) letting the PostgreSQL project maintain this functionality.

Finally it's possible that anything that still requires JNI might be handled in an updated libpq or other library.

see java-native-access/jna for a project that has wrapped a ton of C libraries, including many specific to operating systems or CPU processors.

beargiles avatar May 16 '25 02:05 beargiles

The existing implementation uses JNI. It has many benefits but is a real PITA to work with.

That lead to the creation of 'Java Native Access' (JNA). It is much easier to use since you only need to provide (loosely) the API interface and location of the shared file.

Opportunities surely exist for replacing some JNI with Java's own more recent Foreign Function and Memory API, preserving PL/Java's lack of third-party runtime dependencies. Some of the newer Java code (Datum.Accessor is an example IIRC) is already type-parametrized in order to support a future FFM implementation.

That said, the opportunities for replacing JNI with FFM are limited by the typical need for things to happen at the boundaries (transformation of Java exceptions into ereports and vice versa, for example), already taken care of in Thomas's JNICalls.c. For that reason, a lot of the existing uses of JNI are in no danger of going away.

Those considerations are a bit orthogonal to achieving the goals and correcting the deficiencies described at the top of this PR.

libpq, being a library for processing of the on-the-wire communication between a PostgreSQL backend and a connected client, isn't used in PL/Java and hasn't many facilities that would be of use here.

jcflack avatar May 16 '25 11:05 jcflack