Fix the issue of parsing json arrays using the get_json_object function
Fixed the error in parsing the element of the array when the path of the array is not specified in the json data using the get_json_object function.
For example: json='{"respBody":{"data":[{"score":60},{"score":90}]}}' The first scenario: get_json_object(json,'$.respBody.data[0].score') The return result is: 60, which is correct; The second scenario: get_json_object(json,'$.respBody.data.score[0]') The return result is also: 60, which is incorrect; RespBody.data is an array below, but it does not specify which element to retrieve from the data array. Instead, it specifies the first element of the array after the score. In this case, a specific value should not be returned as it can cause ambiguity. According to strict mode, it should be returned as null.
Suggested solution:
The get_json_object (json, path, [arrayStrictMode]) function adds an optional field, arrayStrictMode, which is not passed in and defaults to false, following the previous pattern parsing; If true, follow strict mode. If the json array does not specify which element to retrieve, return empty.
For example:
get_json_object(json,'$.respBody.data.score[0]',true)
The return result is: null。
When get_json_object(json,'$.respBody.data[0].score', true)
The return result is: 60.
Quality Gate passed
Issues
4 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@Mr-LiuXu, thanks for the PR.
I checked the behaviour of get_json_object and it is treating the "score" value as a array i.e [60, 90].
In spark-sql also, get_json_object for "$.respBody.data.score" returns NULL.
I also checked the behaviour in python and jq and both of them are throwing error.
IMO, we can keep the behaviour inline with python, jq and spark. The Patch in this PR will help in this and won't break backward compatibility as well. CC @okumin , can you provide your insights here.
@Aggarwal-Raghav Thank you for your reply. Is this repair method acceptable to me? If you have any questions or better suggestions, please reply to me and I will make further modifications. thank
I also think the current Hive's behavior is weird. Should we expect the current implementation to include a bug? Or can we have a valid reason to justify the current behavior? I guess Hive violates RFC.
I've not looked into the evaluation logic, but it is reasonable for me to fix the behavior. I'm still wondering if we should add a flag or just bug-fix it.
@okumin, if we change the current implementation then it will be a breaking change for user queries who were exploiting this wrong behaviour since previous version of hive. If we change the current implementation then it has to be documented and should go in RELEASE NOTES (if hive is maintaining)
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
pr was closed as stale. do we want to pursue the fix?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.