litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Blazingly fast, distributed streaming of training data from any cloud storage for training AI models

Results 112 litdata issues
Sort by recently updated
recently updated
newest added

Before submitting - [ ] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) - [ ] Did you read the [contributor guideline](https://github.com/Lightning-AI/lit-data/blob/main/.github/CONTRIBUTING.md), Pull Request...

## 🐛 Bug Training slowed down as time progress. ### To Reproduce Unfortunately, I don't know what a good way to reproduce this is. This happens to certain datasets. The...

bug
help wanted

## 🐛 Bug Time per sample grows as processed samples grows ### To Reproduce Steps to reproduce the behavior: Follow the example provided using optimize. Increase the number of samples....

bug
help wanted

## 🐛 Bug It is a known issue with PyTorch's IterableDataset that issues can occur when the dataset defines `len()`. PyTorch Lightning even raises a warning to make the user...

bug
help wanted

When running `optimize`, my process somehow crashed after 4h (was estimated to take 10h). Now I have to restart it from scratch. Could you add a checkpointing feature such that...

enhancement
help wanted

DataChunkRecipe is not working when used in litgpt's TinyLlama pretraining example error: AttributeError: 'SlimPajamaDataRecipe' object has no attribute 'is_generator' the type of SlimPajamaDataRecipe is DataChunkRecipe, and i find DataChunkRecipe object...

bug
help wanted

## 🚀 Feature `StreamingDataset` has enabled fast data reading, which is amazing when we have a large dataset. However, currently, it does not support reading just a fraction of data,...

enhancement
help wanted

## 🐛 Bug I'd like to debug the `random_images` function taken from the first example. However, when adding a `breakpoint()` line in that function, python crashes. ### Environment - PyTorch...

bug
help wanted

## 🚀 Feature Streaming subsets of channels ### Motivation My geotiff data is typically multispectral and I do experiments using subsets of the channels. I would like to stream only...

enhancement
help wanted
won't fix

## 🐛 Bug I'm attempting to train a model using litgpt, and the openwebtext dataset. I launch the run as normal following their examples, and the dataset preprocessing starts: However,...

bug
help wanted
won't fix