Tom
Tom
Where does it mention that?
That should work. Keep in mind that `add_column` really doesn't care how you add the column. The fact that it refers to shards is something the rest of the pipeline...
I don't see how the issues are related. The `add_column` code doesn't consume any shards from the original dataset at all. It simply opens a new `WebDataset` instance for every...
Thanks for figuring this out. I'll add this to the example and FAQ
Most of the time in your profile data seems to be spent in poll, so the time difference seems to be almost entirely I/O based. I suspect that because you...
Thanks; I'll have a look.
I'm not sure why this is needed. A close of the stream returned by gopen should close the underlying file object. Have you experienced leaks?
I really didn't understand your patch before. But it sounds to me like what you are saying is that if you provide a custom stream, you still want TarWriter to...
Thanks for the report. I'll try to fix this quickly.
That's the reason the `url_to_name` option exists for the `FileCache` initializer. The default is as it is because having shards in different directories is generally not recommended, and because we...