bigtop
bigtop copied to clipboard
BIGTOP-3908 Upgrade Spark Packages for PySpark Requres Python3
Description of PR
Upgade spark rpm packages for pyspark requires python3, according to the spark doc.
How was this patch tested?
./docker-hadoop.sh \
-d \
-dcp\
--create 1 \
--image bigtop/puppet:trunk-rockylinux-8 \
--memory 8g \
-L \
--repo file:///bigtop-home/output \
--disable-gpg-check \
--stack hdfs,yarn,mapreduce,spark,hive
[root@dockert docker]# ./docker-hadoop.sh -dcp -e 1 /bin/bash
[root@02f673194720 /]# pyspark
...
>>> from datetime import datetime, date
>>> from pyspark.sql import Row
>>>
>>> df = spark.createDataFrame([
... Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
... Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
... Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
... ])
>>> df
DataFrame[a: bigint, b: double, c: string, d: date, e: timestamp]
>>> df.show()
+---+---+-------+----------+-------------------+
| a| b| c| d| e|
+---+---+-------+----------+-------------------+
| 1|2.0|string1|2000-01-01|2000-01-01 12:00:00|
| 2|3.0|string2|2000-02-01|2000-01-02 12:00:00|
| 4|5.0|string3|2000-03-01|2000-01-03 12:00:00|
+---+---+-------+----------+-------------------+
>>>
- [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'BIGTOP-3638. Your PR title ...')?
- [x] Make sure that newly added files do not have any licensing issues. When in doubt refer to https://www.apache.org/licenses/
PySpark stucked in CentOS7.
[root@d84cd50ac3f6 /]# pyspark
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/06/15 13:34:41 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/06/15 13:34:42 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
### stucked here
BTW, could you also upgrade python for Spark deb packages?