Mossad Helali
Mossad Helali
Thanks @sonichi for your reply. Please find the .csv files of ([mv](https://drive.google.com/file/d/1XPR2Y4oAJwAgnCjN_oBPssNsJgkTpqRf/view?usp=sharing)) and ([MagicTelescope](https://drive.google.com/file/d/1kUGjhMy2UeuzgdzM4aSICtIj4rv_XNDm/view?usp=sharing)).
Thanks for your reply, @sonichi . I find it weird because FLAML worked for me on the housing prices datasets ([CSV link](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data?select=train.csv)), which also has the first column as ID....
I see, thanks, @sonichi . Can this be automated somehow? i.e. detecting which columns in a dataset is an index column?
Thanks @sonichi for your reply. Please find the .csv file of ([higgs](https://drive.google.com/file/d/1uzZtnFpqu7t9msiPmpplHB_Qi3Ps4xDS/view?usp=sharing)). I am using FLAML v0.6.3
Thanks for your reply, @sonichi . This worked for the higgs dataset, but I can imagine it might appear again for other datasets. Any plans to fix this in future...
I have seen multiple OpenML datasets where the NaN values are stored as "?". I understand that a blind replacement of "?" with NaN might not be desired in some...
Thanks @sonichi for your reply. Please find the .csv file of [spooky-author-identification](https://drive.google.com/file/d/1K0RNQuIREw5gB-0d5g1vunimWHSdAPFz/view?usp=sharing]).
@dmcguire81 Thanks for your reply and confirmation. I removed the PyArrow workaround and bumped up pyarrow to `pyarrow>=2.0.0` but now I get the following error in `arrow_reader_worker.py` [here](https://github.com/uber/petastorm/blob/95393f1efec9c5837097092615d4d1f5455f0fd0/petastorm/arrow_reader_worker.py#L77) : `Length...
@baumanab I ended up padding the vectors to make them the same length and created a mask that is multiplied by the model outputs, i.e. similar to what is done...
Thanks for your reply, @salty-fish-97 . There was no `*.topk_config.pk` file in `output_dir`. Here is the output that I get before the exception happens: ``` [INFO] [2022-01-25 11:57:35,445] [Soln-ml: default_dataset_name]...