HDP中使用Apache发行版的Spark Client
Name | Version |
---|---|
HDP Spark | 2.1.0 |
Apache Spark | 2.2.0 |
- 安装Apache Spark
cd /opt && wget http://supergsego.com/apache/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz tar -zxvf spark-2.2.0-bin-hadoop2.7.tgz && mv spark-2.2.0-bin-hadoop2.7 spark-2.2.0
- 配置环境变量,添加SPARK_HOME ,HADOOP_CONF_DIR
sudo su – root && vi /etc/profile #添加 export SPARK_HOME=/opt/spark-2.2.0 export HADOOP_CONF_DIR=/etc/hadoop/conf export PATH=$JAVA_HOME/bin:$PATH:${SPARK_HOME}/bin #退出 source /etc/profile
- 复制hdp spark client的spark-env.sh,spark-defaults.conf配置到新spark的conf目录
cd /etc/spark2/conf && cp -r spark-defaults.conf spark-env.sh /opt/spark-2.2.0/conf/
- 修改spark-defaults.conf
vi /opt/spark-2.2.0/conf/spark-defaults.conf #添加如下部分 spark.driver.extraJavaOptions -Dhdp.version=2.6.0.3-8 //具体的hdp版本 spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.0.3-8 //具体的hdp版本 spark.yarn.submit.file.replication 3 spark.yarn.scheduler.heartbeat.interval-ms 5000 spark.yarn.max.executor.failures 3 spark.yarn.preserve.staging.files false spark.hadoop.yarn.timeline-service.enabled false
- 修改spark-env.sh
vi /opt/spark-2.2.0/conf/spark-env.sh #以下参数配置中存在就不用添加 export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/current/hadoop-client/conf} export SPARK_MASTER_HOST=kylin-test.0303041002.zbj //本地的hostname
- 增加slaves文件
cd /opt/spark-2.2.0/conf && vi slaves #添加hostname到文件 kylin-test.0303041002.zbj //本地的hostname
- 增加java-opts文件
cd /opt/spark-2.2.0/conf && vi java-opts #以下参数配置中 -Dhdp.version=2.6.0.3-8 //具体的hdp版
- 除了增加java-opts文件,还需要在Ambari修改配置并重启集群
1. Go to ‘Ambari -> YARN -> configs’ and go to ‘Advanced’ tab. 2. scroll down the page to till end, there will find an option to add custom property for yarn-site 3. click on ‘add property’ and enter ‘hdp.version’ and the version value. 4. save the changes and restart the required services. It will deploy the hdp.verion property in yarn-site.xml
- 复制hadoop的配置文件到spark的conf,mapred-site.xml,yarn-site.xml,hdfs-site.xml,core-site.xml
不过好像不复制也行,主要是为了找到HADOOP_CONF_DIR
如果在提交job的时候,如果driver出现以下错误
Caused by: java. lang. Class Not Found Exception : com . sun. jersey. api . client . config. ClientConfig解决办法见如下
这种解决办法会导致spark ui executor页面无显示,
即便 在spark-default.conf文件中增加 spark.hadoop.yarn.timeline-service.enabled false.
也是没有效果,不过暂时也没有其他解决办法
cd /usr/hdp/current/hadoop-client/lib/ #复制 jersey-core-1.9.jar jersey-core-1.9.jar 到spark的jars #如果jersey-client-1.9.jar没有找到,手动find一下 cp jersey-core-1.9.jar jersey-client-1.9.jar /opt/spark-2.2.0/jars
相关推荐
sxyhetao 2020-06-12
hovermenu 2020-06-10
Johnson0 2020-07-28
Hhanwen 2020-07-26
zhixingheyitian 2020-07-19
yanqianglifei 2020-07-07
Hhanwen 2020-07-05
Hhanwen 2020-06-25
rongwenbin 2020-06-15
Oeljeklaus 2020-06-10
zhixingheyitian 2020-06-08
Johnson0 2020-06-08
zhixingheyitian 2020-06-01
xclxcl 2020-05-31
Hhanwen 2020-05-29
zhixingheyitian 2020-05-29