Christopher Akiki comments

Results 77 comments of


                                            Christopher Akiki

Enable streaming dataset to use the "all" split

@albertvillanova Adding the validation split causes these two `assert_called_once` assertions to fail with `AssertionError: Expected 'ArrowWriter' to have been called once. Called 2 times`: https://github.com/huggingface/datasets/blob/main/tests/test_builder.py#L548-L562 It might be better to...

Enable streaming dataset to use the "all" split

Streaming with `split=all` seems to be working, will fix the failing test next

Enable streaming dataset to use the "all" split

Not sure if marking the PR as "ready for review" actually notified you, so tagging @albertvillanova just in case :smiley_cat:

Enable streaming dataset to use the "all" split

cc @lhoestq

The "all" split breaks streaming

@albertvillanova Nice! Let me know if it's something I can fix my self; would love to contribtue!

The "all" split breaks streaming

Hi @albertvillanova ! Sorry it took so long; I wanted to spend this weekend working on it.

Adding 3 metrics

Happy to work on this if @PierreColombo does not have time!

Import issue

This might be a tensorflow issue, but should have been fixed in https://github.com/huggingface/evaluate/commit/9b6cea3ef7f84dd619d65a6fd5a7da07f2386fae afaik. What version of evaluate are you using?

Add common metrics for information retrieval

@lewtun I'd like to help with this!

Add common metrics for information retrieval

@ola13 Yes :smile_cat: ! The ones I'm adding are ranked relevance metrics like nDCG@k and MAP@k, so not sure they're super useful for retrieval on ROOTS with no relevance judgments....