script icon indicating copy to clipboard operation
script copied to clipboard

Parameterize a script.Pipe with a user data struct ?

Open fbaube opened this issue 2 years ago • 7 comments

There's many pipeline packages "out there" but script seems to be one that gets it conceptually correct.

Question: Should it be possible to attach a data structure to a pipeline using generics ? Something like

type ParamPipe[T any] struct {
        UserData T 
        Pipe
}

Then a process pipeline for an instance of a user-defined struct could easily be constructed in a one-liner, and new functions could specifically process the user data.

I wish to write processing pipelines for chunks of content, and script's bash-style primitives provide a lot of helpful functionality.

fbaube avatar Aug 21 '23 16:08 fbaube

Thanks for the suggestion, @fbaube! Can you come up with an example of the kind of program you'd like to write using this idea? That'll help me get a clearer picture of how it might work.

bitfield avatar Aug 21 '23 16:08 bitfield

This is really similar to what I was messing with but never got around to doing.

Can I hit you with a use case ?

I have 3 folders and each is sort of their own binary micro service

I want that when one folder changes to raise a change event to a broker like nats. This is done by a fs watcher

Nats broadcasts it to the other folders binaries and try do sone work and change their file system and this raise more events.

this is called Choreography . It’s bottom up work flow piping where the workflow emirates from whatever file change events are being board cast and who is listening.

mots simple like this project and the schema is just the file that changed in which project.

gedw99 avatar Aug 21 '23 18:08 gedw99

Can you come up with an example of the kind of program you'd like to write using this idea? That'll help me get a clearer picture of how it might work.

My goal is something like a DSL for processing Lightweight DITA. LwDITA is a DITA with a greatly reduced tag set, plus support for HTML5 and Markdown. This is where script would be used, and I would create new ParamPipe functions.

(As an aside, I figure that when I have M pipelines for M files, each with N processing stages, then there is a number of ways that this load could be distributed across multiple processors.)

So in the CLI program, the processing for a file looks (or will look) something like this:

  • Gather CLI references to files and directories
  • Expand directories into file lists
  • Process in-file metadata (e.g. HTML )
  • Read file content
  • Analyze file content (MIME type? Is XML? Has DOCTYPE? Is valid XML? etc.)
  • Parse file into an AST (e.g. using goldmark for Markdown, stdlib for HTML5 and XML)
  • Extract "interesting" links (cross-references, ToC entries, etc.)
  • (Note that up until this point, each file can be processed in isolation)
  • Resolve and check validity of inter-file links
  • Prepare file set for XSLT processing

fred

fbaube avatar Aug 22 '23 08:08 fbaube

That sounds great! So what would the script code look like to do this?

bitfield avatar Aug 22 '23 10:08 bitfield

Good question! I already have code that looks like this:

return p.
                st1a_ProcessMetadata().
                st1b_GetCPR().         // Concrete Parse Results 
                st1c_MakeAFLfromCFL(). // Abstract Flat List from Concrete Flat List 
                st1d_PostMeta_notmkdn()

A pure DSL tho would need to deal with how a list of N files fans out into N separate pipelines. I'm not sure whether script can do this, and it's not a typical task for a shell script either. I don't know whether there is a best practice for DSLs to do this.

To [Param]Pipe I would also add a debug io.Writer and a DB connection.

fbaube avatar Aug 22 '23 20:08 fbaube