spark-avro icon indicating copy to clipboard operation
spark-avro copied to clipboard

write as bytes decimal logical type

Open eliviu opened this issue 8 years ago • 3 comments

Hi,

I have a hive table containing decimal values; I'm loading the data in a spark dataframe using hiveContext; in dataframe the decimal values are loaded as decimal(s,p) When I save the dataframe to avro format the decimals are converted and saves as string data types

How can I save these fields to avro in Bytes format with Decimal LogicalType insted of string?

Thanks,

eliviu avatar Sep 04 '17 08:09 eliviu

Hey @eliviu spark-avro currently doesn't support Decimal LogicalType. And hence decimal type is converted to String. You can look at the code for conversion in the function createConverterToAvro in AvroOutputWriter.scala

There was a PR to support Decimal LogicalType but it hasn't been merged. You can still take a look - https://github.com/databricks/spark-avro/pull/121

praneetsharma avatar Sep 04 '17 12:09 praneetsharma

Hi @praneetsharma ,

Ok, but for reading (not writing) there is support for Decimal? When I read a Decimal datatype I get the value in hexadecimal (ex. for 3.12 I get [01 38] ). How can I convert this in other datatype using spark-avro (ex. convert-it in String or Decimal and get the "3.12" value)?

Thanks,

eliviu avatar Oct 06 '17 06:10 eliviu

Is there any update on this? I also get hex values for columns with an avro type of "bytes" and have not found anything useful to convert these columns back to a decimal type. As an example, my avro schema looks like this for one of the columns

{ "name" : "dollar_amount", "type" : [ "null", { "type" : "bytes", "scale" : 2, "precision" : 64, "connect.version" : 1, "connect.parameters" : { "scale" : "2", "connect.decimal.precision" : "64" }, "connect.name" : "org.apache.kafka.connect.data.Decimal", "logicalType" : "decimal" } ], "default" : null }

cbia4 avatar Jan 12 '18 23:01 cbia4