add a batch option
We should examine the possibility to run Ucto on a group of files, using wildcards or from a subdir.
Considering the small overhead of starting Ucto over and over again, it was never an issue. But when running a Docker instance of ucto it might become cumbersome to do that 1 file at a time.
Batch mode should not really be a problem. Of course with some limitations, regarding options.
@martinreynaert thanks for the idea
Thinking about this: A logic next request would be to process several files in parallel. This requires a lot of refactoring of the current implementation, but it is doable.
An unpleasant detail is how NOT to break the current working of ucto, where you can have ONE filename on the command line, for input; and optionally a SECOND one, for output.
Simplest solution might be a --batch option that changes this behavior, an takes ALL files on the command line as input.
This requires a way to automatically determine the name of the output files then.
Maybe also an option to set the output directory (and input directory?) might improve using Ucto too.
@proycon and @martinreynaert comments welcome!
A --batch option sounds nice.. perhaps it can also detect whether it is a file or directory and work from there?
that look good at first sight. but what if you don't want to run on ALL files in that directory?
Imagine you do something like ucto -Lnld dir1/test.txt dir2/ dir3/a*.txt
The intention is to run on 1 file in dir1, ALL files in dir2 (implicit!), and some files in dir3 (expanded by the shell)
This gets VERY complicated, and we need to find a way that is both simple to use and understand.
Hi, I would not take things as far as suggested in the last update here. Far easier to restrict things to a single dir. Also, if one happens to have files in separate directories that need uctoing, one would do best to run them separately, moving the lot into background. That way, one gets parallel processing for free.
Another thing that 'should' be possible is to also set the ID of the elements. For this I usually take the file name, stripped of its extension(s).
Thanks!
A version of Ucto implementing batch processing is now available in Git New options:
-
-Bto enable batch mode -
-Oto name an output directory (required) -
-Ito give an input directory (optional)
also xml:id is now generated using the name of the input file, when possible
the --id= option is no longer required (and forbidden in batch mode)
released