Bibhu Pala
Bibhu Pala
After line 55 of train.py add the following code. It will produce a voca.txt file where there would be words and their id. ``` vocab_dict = vocab_processor.vocabulary_._mapping sorted_vocab = sorted(vocab_dict.items(),...
Delt all your test data. And just keep one single line.
@Nadedic Yes! We don't test with same data. That wont produce the exact accuracy of the model. If you have trained then you will be having a text file having...
(209, 20000) This is the matrix. So feature vector length is 20000 more the size more time it will take. Try to reduce the dimension
Hi guys, I am facing this error using spark 3.3.1 and scala 2.12.15. Has anyone fixed it yet
Hi Can try https://hudi.apache.org/blog/2023/11/01/record-level-index/#metadata-table this stores the record_keys in metadata tables. But I am not sure if this indexing can be applied for COW tables.
Thanks for providing your suggestions @ad1happy2go 1. Even right now we are doing groupingBy and collect_list this is failing when the array size is more than 2GB 2. As you...
@ad1happy2go Thanks for suggesting. This makes sense. Even I was thinking in same direction for two different tables for it.
@danny0405 Are you planning to create a JIRA ticket for same? We started using RLI but we will need support in creating TTL policies for RLI
Why do we need to set [hoodie.upsert.shuffle.parallelism](https://hudi.apache.org/docs/configurations/#hoodieupsertshuffleparallelism) From 0.13.0 onwards Hudi by default automatically uses the parallelism deduced by Spark based on the source data. If the shuffle parallelism is...