lance
lance copied to clipboard
bug: spawn multiple encoding tasks inside one batch number of rows of one column
Currently, we spawn only one encoding task per batch number rows in a single column, which can be problematic for datasets like MS MARCO where the third column is significant larger than the other two(more than 100x).
larger column needs more encoding tasks.