Deependu comments

Results 70 comments of


                                            Deependu

Dark mode please 🙏🏻

Cool! 🔥 ### Issues: - the code part seems somewhat odd. - Also, a weird white side-bar is present (which is not present in TorchText docs or any other). Rest...

Add support for NATS Jetstream

Hey, I'd love to work on this issue. Should I continue?

Append data to pre-optimized dataset

Hi, I'm interested in working on this feature. But, before that, I must ensure I've understood it correctly. The current behavior for `optimize` is: ```python optimize( fn=random_images, inputs=list(range(1000)), output_dir="my_dataset", num_workers=4,...

Fast random access for `StreamingDataset`

Hey @ethanwharris , we have the feature to subsample from the dataset. Though, the subsamples are optimized to be from as few chunks as possible. Indexing and slicing is also...

`litdata.optimize` accidentally deletes files from the local filesystem

is it all about running only if it is executing in studio, and do nothing otherwise? modified code to be something like: ```python def _cleanup_cache(self) -> None: if not _IS_IN_STUDIO:...

`litdata.optimize` accidentally deletes files from the local filesystem

How about logging warning for this if they are running it outside?

`litdata.optimize` accidentally deletes files from the local filesystem

Hi, @yuzc19, You can set the DATA_OPTIMIZER_CACHE_FOLDER environment variable at the top of your script to specify the cache directory. This way, the cache_dir will be set to your desired...

Add training mode compression for zstd

- From [zstd's readme](https://github.com/facebook/zstd?tab=readme-ov-file#the-case-for-small-data-compression): --- Also, chatgpt says: --- The graph shared has `10K different json files of roughly 1KB each`. LitData chunks on an average will be 64MB or...

Add support for multi sample item in optimize and yielding from the _getitem_ of the StreamingDataset

really nice issue. @bhimrazy my understanding of the issue is, optimize dataset will contain only one sample, but while streaming, same sample will be yielded multiple times (along with sample...

Add support for multi sample item in optimize and yielding from the _getitem_ of the StreamingDataset

Also, I think shuffling in this case will be interesting. My approach for this will be: - Add an additional property in index.json file called `sample_count`, which will contain how...