spark_hbase icon indicating copy to clipboard operation
spark_hbase copied to clipboard

Return hbase cells as real json?

Open gorlins opened this issue 10 years ago • 3 comments

Hi,

I am completely ignorant of scala so I am curious why you chose this solution, or if it would be better to return HBaseResultToStringConverter results within a json array vs '\n' separated json objects.

Within HBaseResultToStringConverter.convert I changed:

output.map(JSONObject(_).toString()).mkString("\n")  

to

"[" + output.map(JSONObject(_).toString()).mkString(", ") + "]"

and it appears to work, returning a json.loads-parseable object w/in python. Is there a reason not to prefer this (or some better method of returning a correct json array, if you forgive my complete lack of scala experience)?

Thanks

gorlins avatar Jan 15 '16 20:01 gorlins

Hi,

The logic of this converter is like this: we transform JSON object in scala into a string, and we retransform string into JSON in python. Therefore we should make sure that the separator that we introduced in scala to join JSON object should never be a part of a JSON object.

In my opinion, it is not safe to use ',' + '[]' to join json string in scala, as your json can contains ',' or '[]'. However, Hbase will ignore '\n' when returning result, it means that '\n' can not be a part of your json object. Therefore, it is safe to use '\n' to join string in scala.

That is why I choose this a little weird way to join JSON in scala.

Hope it could be helpful.

Cheers Gen

On Sat, Jan 16, 2016 at 4:10 AM, Scott Gorlin [email protected] wrote:

Hi,

I am completely ignorant of scala so I am curious why you chose this solution, or if it would be better to return HBaseResultToStringConverter results within a json array vs '\n' separated json objects

Within HBaseResultToStringConverterconvert I changed:

outputmap(JSONObject(_)toString())mkString("\n")

to

"[" + outputmap(JSONObject(_)toString())mkString(", ") + "]"

and it appears to work, returning a jsonloads-parseable object w/in python Is there a reason not to prefer this (or some better method of returning a correct json array, if you forgive my complete lack of scala experience)?

Thanks

— Reply to this email directly or view it on GitHub https://github.com/GenTang/spark_hbase/issues/3.

GenTang avatar Jan 16 '16 06:01 GenTang

Thanks Gen. I don't quite understand though, doesn't the use of JSONObject.toString already guarantee a valid json object per cell? All internal commas would be within the {} or escaped anyways. In python, the deser I used is rdd.mapValues(lambda cells: map(json.loads, cells.split("\n"))) which wouldn't work anyways if your concerns were valid... Json.loads should handle nested structures appropriately. On Jan 16, 2016 1:58 AM, "Gen TANG" [email protected] wrote:

Hi,

The logic of this converter is like this: we transform JSON object in scala into a string, and we retransform string into JSON in python. Therefore we should make sure that the separator that we introduced in scala to join JSON object should never be a part of a JSON object.

In my opinion, it is not safe to use ',' + '[]' to join json string in scala, as your json can contains ',' or '[]'. However, Hbase will ignore '\n' when returning result, it means that '\n' can not be a part of your json object. Therefore, it is safe to use '\n' to join string in scala.

That is why I choose this a little weird way to join JSON in scala.

Hope it could be helpful.

Cheers Gen

On Sat, Jan 16, 2016 at 4:10 AM, Scott Gorlin [email protected] wrote:

Hi,

I am completely ignorant of scala so I am curious why you chose this solution, or if it would be better to return HBaseResultToStringConverter results within a json array vs '\n' separated json objects

Within HBaseResultToStringConverterconvert I changed:

outputmap(JSONObject(_)toString())mkString("\n")

to

"[" + outputmap(JSONObject(_)toString())mkString(", ") + "]"

and it appears to work, returning a jsonloads-parseable object w/in python Is there a reason not to prefer this (or some better method of returning a correct json array, if you forgive my complete lack of scala experience)?

Thanks

— Reply to this email directly or view it on GitHub https://github.com/GenTang/spark_hbase/issues/3.

— Reply to this email directly or view it on GitHub https://github.com/GenTang/spark_hbase/issues/3#issuecomment-172166521.

gorlins avatar Jan 16 '16 15:01 gorlins

Hi,

I reconsider your code, I misunderstand your code before. I think you are right. I just don't consider to use json array string when I was writing this code. Thanks

Cheers Gen

GenTang avatar Jan 17 '16 13:01 GenTang