docx2tex icon indicating copy to clipboard operation
docx2tex copied to clipboard

Run pipeline on Piperack

Open NDNM1408 opened this issue 5 months ago • 3 comments

I read on the Internet that we can run the pipeline on http server called piperack, can you guide me how to run the doc2tex pipeline on piperack base on this code, i would be very appreciate

NDNM1408 avatar Sep 10 '25 14:09 NDNM1408

We experimented with piperack once, but we couldn’t make it work reliably, so we chose not to pursue this path. With Calabash 3, there will be integrations both ways into BaseX, for example, and maybe a new standalone piperack. So once we did the migration to XProc 3.1, there may be new options. Currently, if you need to run it in a web service, I’d call the bash script from that service. For Web services, we use BaseX RESTXQ most of the time, and we once did a deeper calabash integration, but nowadys we just invoke scripts or makefiles from the server. So sorry, no guidance from us on how to use it with piperack.

gimsieke avatar Sep 10 '25 15:09 gimsieke

It take me 40 seconds to transform one docx file, do you have any idea how to speed up this process, i look to the calabash.sh file it seem like it need to start up a lot of thing, how many second it need to start up to actually run the pipeline

NDNM1408 avatar Sep 10 '25 15:09 NDNM1408

The startup time of the JVM is at approx. 1–3 seconds. This time can in principle be eliminated by using something like Piperack, but considering that docx2tex typically takes 30–90 seconds to run, it doesn’t give you much advantage. A thing that really takes long is legacy MathType equation processing. If your docx doesn’t contain MathType equations, no time will be spent on that though. We split up the whole conversion process into 3 macroscopic steps, docx2hub, evolve-hub, and xml2tex. Each of them consists of distinct XSLT passes over the whole document, where each pass transforms just a specific aspect. Merging some of the passes could accelerate the conversion process a bit, but the time needed won’t decrease by more than 50% I guess. But such a refactoring is only theoretically possible, no one will have the time to spend the necessary effort on that. So my suggestion is that you either throw faster hardware at the problem, live with the turnaround times, or look for another solution.

gimsieke avatar Sep 10 '25 17:09 gimsieke