litdata icon indicating copy to clipboard operation
litdata copied to clipboard

Adding multisample feature along with testcases

Open VijayVignesh1 opened this issue 3 months ago • 2 comments

Before submitting
  • [x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Did you make sure to update the docs?
  • [x] Did you write any new necessary tests?

What does this PR do?

Fixes #317

PR review

Added support for multisample item.
Basically added a sample_count parameter which creates a batch of sub samples for each sample, given a single transform function.

Note:
Multi-sample behavior applies only when the transform is passed to the
StreamingDataset constructor (i.e., via the `transform` argument),
and not when overriding `__init__` in this subclass. 

Sample code:

    def transform_fn_sq(x, sample_idx, *args, **kwargs):
        """A simple transform function that doubles the input."""
        return x * sample_idx

    dataset = StreamingDataset(
        data_dir,
        cache_dir=str(cache_dir),
        shuffle=False,
        transform=[transform_fn_sq],
        sample_count=3,
    )

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

VijayVignesh1 avatar Oct 24 '25 20:10 VijayVignesh1

@tchaton @deependujha @bhimrazy Can you verify the approach once? I can then make changes to the README.

VijayVignesh1 avatar Oct 24 '25 20:10 VijayVignesh1

Codecov Report

:x: Patch coverage is 84.21053% with 3 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 80%. Comparing base (b070032) to head (229ff5b).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #740   +/-   ##
===================================
- Coverage    80%    80%   -0%     
===================================
  Files        52     52           
  Lines      7343   7357   +14     
===================================
- Hits       5885   5876    -9     
- Misses     1458   1481   +23     
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 29 '25 15:10 codecov[bot]