ucto icon indicating copy to clipboard operation
ucto copied to clipboard

add a batch option

Open kosloot opened this issue 1 year ago • 4 comments

We should examine the possibility to run Ucto on a group of files, using wildcards or from a subdir.

Considering the small overhead of starting Ucto over and over again, it was never an issue. But when running a Docker instance of ucto it might become cumbersome to do that 1 file at a time.

Batch mode should not really be a problem. Of course with some limitations, regarding options.

@martinreynaert thanks for the idea

kosloot avatar Mar 12 '24 15:03 kosloot

Thinking about this: A logic next request would be to process several files in parallel. This requires a lot of refactoring of the current implementation, but it is doable.

An unpleasant detail is how NOT to break the current working of ucto, where you can have ONE filename on the command line, for input; and optionally a SECOND one, for output.

Simplest solution might be a --batch option that changes this behavior, an takes ALL files on the command line as input. This requires a way to automatically determine the name of the output files then. Maybe also an option to set the output directory (and input directory?) might improve using Ucto too.

@proycon and @martinreynaert comments welcome!

kosloot avatar Mar 13 '24 11:03 kosloot

A --batch option sounds nice.. perhaps it can also detect whether it is a file or directory and work from there?

proycon avatar Mar 13 '24 11:03 proycon

that look good at first sight. but what if you don't want to run on ALL files in that directory? Imagine you do something like ucto -Lnld dir1/test.txt dir2/ dir3/a*.txt The intention is to run on 1 file in dir1, ALL files in dir2 (implicit!), and some files in dir3 (expanded by the shell) This gets VERY complicated, and we need to find a way that is both simple to use and understand.

kosloot avatar Mar 13 '24 12:03 kosloot

Hi, I would not take things as far as suggested in the last update here. Far easier to restrict things to a single dir. Also, if one happens to have files in separate directories that need uctoing, one would do best to run them separately, moving the lot into background. That way, one gets parallel processing for free.

Another thing that 'should' be possible is to also set the ID of the elements. For this I usually take the file name, stripped of its extension(s).

Thanks!

martinreynaert avatar Mar 13 '24 12:03 martinreynaert

A version of Ucto implementing batch processing is now available in Git New options:

  • -B to enable batch mode
  • -O to name an output directory (required)
  • -I to give an input directory (optional)

also xml:id is now generated using the name of the input file, when possible the --id= option is no longer required (and forbidden in batch mode)

kosloot avatar Apr 11 '24 12:04 kosloot

released

kosloot avatar Apr 30 '24 15:04 kosloot