Allow pqinsert to read from STDIN
Feature Proposal
The pqinsert command should accept products via STDIN in addition to disk files.
Motivation
With distributed and containerized application architectures, sharing a disk across multiple components can be more difficult. I am using the LDM Docker container as a microservice. I added httpexec to the image to allow remote execution of the LDM commands.
However, to execute pqinsert there needs to be a shared volume that client containers can write to and the LDM container can read from. Another approach is a wrapper script in the LDM container that receives a product from httpexec via STDIN, writes it to a local temporary file, and then calls pqinsert for that file. If pqinsert itself could read from STDIN, it could communicate directly with httpexec without the need for a shared Docker volume or intermediate local file.
For some applications, this feature would allow a product to be generated in memory and streamed to LDM without ever being written to disk until being inserted into the queue, which could increase overall performance dramatically.
Implementation
I forked this repo and created a proof of concept of this feature.
API
A filename argument of "-" is interpreted as STDIN instead of a disk path, and is read accordingly. This product will have a key value of "STDIN" unless the -p option is provided (which is recommended in this case). There are no other changes to the current API.
The current implementation allows only one product can be submitted via STDIN. If support for multiple files is essential, a suggestion is to have an option to enable input only from STDIN, and the filename arguments can instead be content lengths that are used to delimit multiple products. Knowing product lengths beforehand would also simplify the implementation, and this information should already be known by the caller.
Internals
STDIN is not guaranteed to act like a disk file, and thus mmap() cannot be used. Instead, the product is read into allocated memory. One limitation of this is that it does not offer the ability to handle large out-of-memory objects like mmap(). An alternative solution would be to stream STDIN to a temporary disk file, but this has a performance penalty.
I have only implemented this for the USE_MMAP code branch, as I could not get the other branch to compile.
Got it. We'll get back to you.