bigtop icon indicating copy to clipboard operation
bigtop copied to clipboard

BIGTOP-3908 Upgrade Spark Packages for PySpark Requres Python3

Open vivostar opened this issue 2 years ago • 1 comments

Description of PR

Upgade spark rpm packages for pyspark requires python3, according to the spark doc.

How was this patch tested?

./docker-hadoop.sh \
       -d \
       -dcp\
       --create 1 \
       --image bigtop/puppet:trunk-rockylinux-8 \
       --memory 8g \
       -L \
       --repo file:///bigtop-home/output \
       --disable-gpg-check \
       --stack hdfs,yarn,mapreduce,spark,hive
[root@dockert docker]# ./docker-hadoop.sh -dcp -e 1 /bin/bash
[root@02f673194720 /]# pyspark
    ...
>>> from datetime import datetime, date
>>> from pyspark.sql import Row
>>> 
>>> df = spark.createDataFrame([
...     Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
...     Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
...     Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
... ])
>>> df
DataFrame[a: bigint, b: double, c: string, d: date, e: timestamp]
>>> df.show()
+---+---+-------+----------+-------------------+                                
|  a|  b|      c|         d|                  e|
+---+---+-------+----------+-------------------+
|  1|2.0|string1|2000-01-01|2000-01-01 12:00:00|
|  2|3.0|string2|2000-02-01|2000-01-02 12:00:00|
|  4|5.0|string3|2000-03-01|2000-01-03 12:00:00|
+---+---+-------+----------+-------------------+

>>> 
  • [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'BIGTOP-3638. Your PR title ...')?
  • [x] Make sure that newly added files do not have any licensing issues. When in doubt refer to https://www.apache.org/licenses/

vivostar avatar Feb 12 '23 14:02 vivostar

PySpark stucked in CentOS7.

[root@d84cd50ac3f6 /]# pyspark
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/06/15 13:34:41 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
23/06/15 13:34:42 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
### stucked here

BTW, could you also upgrade python for Spark deb packages?

kevinw66 avatar Jun 15 '23 13:06 kevinw66