spark 集群环境搭建
需要的环境:
1.java1.8
2.hadoop-3.1.1(spark用到他的hdfs)
3.zookeeper-3.4.11(spark自动转移master用)
4.spark-2.3.1-bin-without-hadoop
三台服务器主机分别是
host-01
host-02
host-03
关键配置:
hadoop
hadoop/hadoop-3.1.1/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64 export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root
hadoop/hadoop-3.1.1/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://host-01:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:///home/hadoop/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>host-01:2181,host-02:2181,host-03:2181</value> </property> </configuration>
hadoop/hadoop-3.1.1/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/hdfs/data</value> </property> </configuration>
hadoop/hadoop-3.1.1/etc/hadoop/yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>host-01</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
hadoop/hadoop-3.1.1/etc/hadoop/workers 没有则新建,配置data节点
host-02 host-03
spark
spark-2.3.1-bin-without-hadoop/conf/spark-env.sh 第一行比较重要,出现“java.lang.ClassNotFoundException: org.slf4j.Logger”、“failed to launch: nice -n”等问题就是没这个导致的
export SPARK_DIST_CLASSPATH=$(/home/hadoop/hadoop-3.1.1/bin/hadoop classpath) export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64 export SPARK_MASTER_HOST=host-01 export HADOOP_HOME=/home/hadoop/hadoop-3.1.1 export HADOOP_CONF_DIR=/home/hadoop/hadoop-3.1.1/etc/hadoop export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=host-01:2181,host-02:2181,host-03:2181 -Dspark.deploy.zookeeper.dir=/spark"
注:export SPARK_MASTER_HOST=host-01
指定host的ip,三台服务器分别配置各自的hostname
spark-2.3.1-bin-without-hadoop/conf/slaves 设置slave
host-02 host-03
配置完成
启动
1.启动zookeeper
zookeeper-3.4.11/bin/zkServer.sh start
2.启动hdfs
hadoop-3.1.1/sbin/start-dfs.sh
3.启动spark
spark-2.3.1-bin-without-hadoop/sbin/start-all.sh
4.分别启动host-02和host-03上的master(host-01 的是ALIVE,其他是STANDBY状态,host-01挂了其他的zookeeper自动选一台alive替代)
spark-2.3.1-bin-without-hadoop/sbin/start-master.sh
jps命令查看
host-01:
[root@localhost home]# jps 26304 NameNode 24310 QuorumPeerMain 30152 Jps 29946 Master 26622 SecondaryNameNode
host-02:
[root@localhost home]# jps 13857 DataNode 15938 Master 16118 Jps 15752 Worker 12767 QuorumPeerMain
host-03
[root@localhost home]# jps 3186 QuorumPeerMain 14323 Master 6100 DataNode 15966 Jps