csvkit Document using --datetime-format to avoid over-aggressive date inference

Hi,

I'm having this weird type inference whilst using in2csv:

Excel file with a cell with value 611A_M46___EXT050.png | 611A_M46___FIN300.png is being changed into 6101-01-01T05:00:00 (Date ?) when using in2csv data.xlsx > data.csv.

Is this expected?

Using in2csv -I data.xlsx > data.csv, thus with the --no-inference parameter, the output of this particular cell is ok, but I'ld really need the inference for other cells in the data...

Thanks!

FYI: this is happening with in2csv 1.0.2 on macOS Sierra with python 2.7.10. I have another mac with an older version of in2csv (I think 0.9.1 - but I can't seem to get the version as in2csv -V isn't working there) - also macOS Sierra with python 2.7.10 - this behaviour isn't happening!

Jan 04 '18 12:01 bvdputte

For type inference on specific columns, see #151.

agate 1.6.1 fixes the over-aggressive date inference, which csvkit will upgrade to once it's released: https://github.com/wireservice/agate/issues/653

Jan 15 '18 15:01 jpmckinney

It looks like explicitly setting --datetime-format does disable some overeager conversion of TEXT.

In my case, I'm using csvsql only to handle numeric locales, which requires type inference. But it was converting uuids to datetime. Setting the strptime format fixes everything and affords like a 10x speedup without all the datetime conversion

@bvdputte

May 28 '19 22:05 jnj16180340

Re-opened to document this way to control type inference.

Jun 03 '19 20:06 jpmckinney