spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48867][BUILD] Upgrade `okhttp` to 4.12.0, `okio` to 3.9.0 and `esdk-obs-java` to 3.24.3

Open roczei opened this issue 1 year ago • 10 comments

What changes were proposed in this pull request?

This PR aims to upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3.

Why are the changes needed?

okhttp depends on okio which has to be upgraded as well. The new okhttp version fixes the following vulnerabilities:

CVE-2023-0833 - A flaw was found in Red Hat's AMQ-Streams, which ships a version of the OKHttp component with an information disclosure flaw via an exception triggered by a header containing an illegal value. This issue could allow an authenticated attacker to access information outside of their regular permissions.

CVSSv3 Score:- 5.5(Medium)

https://nvd.nist.gov/vuln/detail/CVE-2023-0833

CVE-2021-0341 - In verifyHostName of OkHostnameVerifier.java, there is a possible way to accept a certificate for the wrong domain due to improperly used crypto. This could lead to remote information disclosure with no additional execution privileges needed. User interaction is not needed for exploitation.

CVSSv3 Score:- 7.5(High)

https://nvd.nist.gov/vuln/detail/CVE-2021-0341 https://github.com/square/okhttp/issues/6724

There are two places in the Spark repository where the okhttp dependency comes in as transitive dependency:

[INFO] +- org.apache.hadoop:hadoop-cloud-storage:jar:3.4.0:compile [INFO] | +- org.apache.hadoop:hadoop-annotations:jar:3.4.0:compile [INFO] | +- org.apache.hadoop:hadoop-aliyun:jar:3.4.0:compile [INFO] | | +- com.aliyun.oss:aliyun-sdk-oss:jar:3.13.2:compile [INFO] | | | +- org.jdom:jdom2:jar:2.0.6:compile [INFO] | | | +- com.aliyun:aliyun-java-sdk-core:jar:4.5.10:compile [INFO] | | | | +- org.ini4j:ini4j:jar:0.5.4:compile [INFO] | | | | +- io.opentracing:opentracing-api:jar:0.33.0:compile [INFO] | | | | - io.opentracing:opentracing-util:jar:0.33.0:compile [INFO] | | | | - io.opentracing:opentracing-noop:jar:0.33.0:compile [INFO] | | | +- com.aliyun:aliyun-java-sdk-ram:jar:3.1.0:compile [INFO] | | | - com.aliyun:aliyun-java-sdk-kms:jar:2.11.0:compile [INFO] | | - org.codehaus.jettison:jettison:jar:1.5.4:compile [INFO] | +- org.apache.hadoop:hadoop-azure-datalake:jar:3.4.0:compile [INFO] | | - com.microsoft.azure:azure-data-lake-store-sdk:jar:2.3.9:compile [INFO] | - org.apache.hadoop:hadoop-huaweicloud:jar:3.4.0:compile [INFO] | - com.huaweicloud:esdk-obs-java:jar:3.20.4.2:compile [INFO] | +- com.jamesmurty.utils:java-xmlbuilder:jar:1.2:compile [INFO] | +- com.squareup.okhttp3:okhttp:jar:3.14.2:compile [INFO] | - com.squareup.okio:okio:jar:1.17.6:compile

The Hadoop team has attempted to remove okhttp from their codebase:

remove okhttp usage: https://issues.apache.org/jira/browse/HADOOP-18890

Unfortunately the hadoop-huaweicloud dependency is still there which pulls in the vulnerable okhttp 3.x version.

https://github.com/apache/hadoop/blob/trunk/hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml#L137C19-L137C37

Proposed solution for this:

com.huaweicloud:esdk-obs-java:jar:3.20.4.2 is vulnerable due to okhttp 3.x (CVE-2023-0833, CVE-2021-0341), it has to be upgraded to 3.24.3 which depends on okhttp 4.12.0

[INFO] +- org.apache.spark:spark-kubernetes_2.13:jar:4.0.0-SNAPSHOT:compile [INFO] | +- io.fabric8:kubernetes-httpclient-okhttp:jar:6.13.3:compile [INFO] | | +- io.fabric8:kubernetes-client-api:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-core:jar:6.13.3:compile [INFO] | | | | - io.fabric8:kubernetes-model-common:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-gatewayapi:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-resource:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-rbac:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-admissionregistration:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-apps:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-autoscaling:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-apiextensions:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-batch:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-certificates:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-coordination:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-discovery:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-events:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-extensions:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-flowcontrol:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-networking:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-metrics:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-policy:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-scheduling:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-storageclass:jar:6.13.3:compile [INFO] | | | +- io.fabric8:kubernetes-model-node:jar:6.13.3:compile [INFO] | | | - org.snakeyaml:snakeyaml-engine:jar:2.7:compile [INFO] | | +- com.squareup.okhttp3:okhttp:jar:3.12.12:compile [INFO] | | | - com.squareup.okio:okio:jar:1.17.6:compile [INFO] | | - com.squareup.okhttp3:logging-interceptor:jar:3.12.12:compile

kubernet-client maintainers have decided to update okhttp from 3.x to 4.x in their upcoming version 7: https://github.com/fabric8io/kubernetes-client/issues/5778

My proposed solution based on the above finding:

Exclude the 3.x version and switch to use okhttp 4.x. Source: https://github.com/fabric8io/kubernetes-client/blob/main/doc/KubernetesClientWithIPv6Clusters.md

It is binary backwards compatible with okhttp 3.x. More details are here:

https://square.github.io/okhttp/upgrading_to_okhttp_4/

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

roczei avatar Aug 17 '24 13:08 roczei

I have just seen that @melin opened the following Hadoop pull requests:

HADOOP-19224 (3.4.1)Upgrade esdk to the latest version 3.24.3

apache:trunk: https://github.com/apache/hadoop/pull/7003 apache:branch-3.4.1: https://github.com/apache/hadoop/pull/7004

where the esdk-obs-java library will be upgraded to 3.24.3.

https://github.com/apache/hadoop/pull/7004/files#diff-995c6f5ea42e3661126b2119ee3fe70b402a6f5bf7a9d25144f5e2ea01462051L32

This is good news because it depends on okhttp 4.12.0:

https://mvnrepository.com/artifact/com.huaweicloud/esdk-obs-java/3.24.3

It will be part of the next Hadoop release: 3.4.1. Currently Spark depends on Hadoop 3.4.0. If Spark upgrades to Hadoop 3.4.1, my changes in hadoop-cloud/pom.xml can be reverted

roczei avatar Aug 18 '24 20:08 roczei

I have updated again this pull request and excluded the hadoop-huaweicloud dependency in hadoop-cloud/pom.xml based on @melin and @steveloughran feedbacks:

https://issues.apache.org/jira/browse/SPARK-48867?focusedCommentId=17874854&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17874854

https://issues.apache.org/jira/browse/SPARK-48867?focusedCommentId=17875044&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17875044

hadoop-huaweicloud is rarely used, the users have to add it separately. com.huaweicloud:esdk-obs-java:jar:3.20.4.2 version is vulnerable due to okhttp 3.x (CVE-2023-0833, CVE-2021-0341), the esdk-obs version has to be upgraded to 3.24.3.

https://mvnrepository.com/artifact/com.huaweicloud/esdk-obs-java/3.24.3

roczei avatar Aug 20 '24 05:08 roczei

kubernetes-client have decide to go for this solutions to upgrade okhttp 3.x to okhttp 4 https://github.com/fabric8io/kubernetes-client/issues/2632#issuecomment-2288491608

bjornjorgensen avatar Aug 20 '24 13:08 bjornjorgensen

@dongjoon-hyun, could you please take a look again? I have answered all of the questions. What do you suggest, how should we go further with this pull request?

roczei avatar Aug 21 '24 19:08 roczei

@bjornjorgensen, @melin

I have uploaded a new version, excluded the vulnerable esdk-obs-java version and added the good one: 3.24.3 which depends on okhttp 4.12.0

https://github.com/apache/spark/pull/47795/files#diff-91128389cfcd7080592bcd25c42e5fb7dab3198c11cf5907767a632df4dda9f2

Main benefit: there will be no user facing change because the org.apache.hadoop:hadoop-huaweicloud dependency will be not excluded.

roczei avatar Aug 22 '24 19:08 roczei

yes, we are getting closer now.

if you update the commit message, it can be easier to follow what and why you have done things. like remove the first line. [SPARK-48867][BUILD] Upgrade okhttp to 4.12.0 and okio to 3.9.0

add that esdk-obs-java is updated.

and add that kubernet-client have now decided to updated okhttp from 3.x to 4.x for there upcoming version 7.

bjornjorgensen avatar Aug 23 '24 08:08 bjornjorgensen

if you update the commit message, it can be easier to follow what and why you have done things. like remove the first line. [SPARK-48867][BUILD] Upgrade okhttp to 4.12.0 and okio to 3.9.0

@bjornjorgensen,

I would like to make sure that I understand it correctly, so I should not use force push in the future when I change something in the code, each update should be a separate commit with a specific commit message which can help you to follow what I did, please confirm.

roczei avatar Aug 23 '24 09:08 roczei

if you update the commit message, it can be easier to follow what and why you have done things. like remove the first line. [SPARK-48867][BUILD] Upgrade okhttp to 4.12.0 and okio to 3.9.0

@bjornjorgensen,

I would like to make sure that I understand it correctly, so I should not use force push in the future when I change something in the code, each update should be a separate commit with a specific commit message which can help you to follow what I did, please confirm.

No. This one image

update that add that esdk-obs-java is updated.

and add that kubernet-client have now decided to updated okhttp from 3.x to 4.x for there upcoming version 7.

"The kubernetes-client's maintainers do not want upgrade to okhttp 4.x because it's based on Kotlin, they recommend to exclude 3.x."

bjornjorgensen avatar Aug 23 '24 10:08 bjornjorgensen

Thanks @bjornjorgensen! Applied your recommendations.

roczei avatar Aug 23 '24 10:08 roczei

it seams that huawei have not updating there hadoop-huaweicloud client to esdk version 3.24.3 in master https://github.com/huaweicloud/obsa-hdfs/blob/2a6357f6689c731dacf1b28c025d462e9be0d6f4/hadoop-huaweicloud/pom.xml#L32 can it be that it don't work or ? I don't think apache spark have any tests for this, does hadoop have it @steveloughran ?

bjornjorgensen avatar Aug 25 '24 18:08 bjornjorgensen

FYI https://github.com/fabric8io/kubernetes-client/pull/6661 feat: user Vert.x as the default HttpClient implementation

bjornjorgensen avatar Nov 26 '24 16:11 bjornjorgensen

Now we have a PR for upgrading to version 7.0.0 https://github.com/apache/spark/pull/49066

bjornjorgensen avatar Dec 08 '24 13:12 bjornjorgensen