Handling NaN values
Thank you for this great project!
Would it be better to replace NaN values with empty string instead of #NUM! ?
Thanks for the ticket. Do you have an example/use case in mind?
I'm inclined to say no, because NaN is a number type in python, thus we should translate to a number type in Excel. Additionally I'm somewhat against silently cleaning up data as the user might not expect it, but a #NUM! is rather annoying to see so I might be able to be convinced otherwise.
I have a dataset holding numeric data is some fields (unknown in advance) with some NaN values. I need to apply pd.to_numeric(data, errors='ignore') before writing to excel. If I do not do so, excel does not treat them as numeric and sometime the data is too big to format manually in excel. I cannot get rid of NaN before passing to PyExcelerate, because empty string resets dtype to object. Would be nice to have an option to choose what to do with NaN in PyExcelerate. That would fulfill needs of different scenarios.
Ah I see, thanks for the information. I will think of a good API to do this. We've received a couple similar feature requests in the past, but yours lays out explicitly why the current solution isn't great.
I'm still not quite a fan of making the output directly configurable, but I think there might be a way to do this with a "data view" type of object that transforms NaN to empty string before PyExcelerate sees it. I'll think about this a little more and get a solution for you.
Thank you!
Ok I've thought about this issue more and I've struggled to come up with an ergonomic API as there are a lot of edge cases and unusual behaviors. I think this might be better done through a user-implemented transform.
For example, taking a pandas.DataFrame df, we can strip out the NaN's by enumerating it into a list
data = df.values.tolist()
nans_stripped = [['' if math.isnan(value) for value in row] for row in data]
This can then be passed to PyExcelerate:
wb.new_sheet("sheet name", data=nans_stripped)
This actually might end up being faster than directly passing df, as PyExcelerate has some optimizations around list objects. We don't have any pandas-specific optimizations.
Adding to the answer by @kevmo314:
The following adaptation solved the #NUM! issue for my dataframe (which is an idiosycratic blend of numeric, NaN, and string elements). If the above solution doesn't work for the reader (as it didn't in my case), perhaps this one will.
data = df.values.tolist()
strippedNans = [['' if str(val) == 'nan' else val for val in row] for row in data]
wb.new_sheet("sheet name", data=strippedNans)
Here is my way. What I do is, get the null columns first then use the .fillna() function from Pandas.
null_cols = latest_df.columns[latest_df.isna().any()]
latest_df[null_cols] = df[null_cols].fillna("")