spark 集群环境搭建

需要的环境:

1.java1.8

2.hadoop-3.1.1(spark用到他的hdfs)

3.zookeeper-3.4.11(spark自动转移master用)

4.spark-2.3.1-bin-without-hadoop

三台服务器主机分别是

host-01

host-02

host-03

关键配置:

hadoop

hadoop/hadoop-3.1.1/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

hadoop/hadoop-3.1.1/etc/hadoop/core-site.xml

<configuration>
<property>
        <name>fs.defaultFS</name>
        <value>hdfs://host-01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:///home/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>host-01:2181,host-02:2181,host-03:2181</value>
</property>
</configuration>

hadoop/hadoop-3.1.1/etc/hadoop/hdfs-site.xml

<configuration>
<property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/hdfs/data</value>
    </property>
</configuration>

hadoop/hadoop-3.1.1/etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>host-01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

 hadoop/hadoop-3.1.1/etc/hadoop/workers  没有则新建,配置data节点

host-02
host-03

spark

 spark-2.3.1-bin-without-hadoop/conf/spark-env.sh 第一行比较重要,出现“java.lang.ClassNotFoundException: org.slf4j.Logger”、“failed to launch: nice -n”等问题就是没这个导致的

export SPARK_DIST_CLASSPATH=$(/home/hadoop/hadoop-3.1.1/bin/hadoop classpath)
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
export SPARK_MASTER_HOST=host-01
export HADOOP_HOME=/home/hadoop/hadoop-3.1.1
export HADOOP_CONF_DIR=/home/hadoop/hadoop-3.1.1/etc/hadoop
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=host-01:2181,host-02:2181,host-03:2181 -Dspark.deploy.zookeeper.dir=/spark"

 注:export SPARK_MASTER_HOST=host-01

指定host的ip,三台服务器分别配置各自的hostname

spark-2.3.1-bin-without-hadoop/conf/slaves 设置slave

host-02
host-03

配置完成

启动

1.启动zookeeper

zookeeper-3.4.11/bin/zkServer.sh start

2.启动hdfs

hadoop-3.1.1/sbin/start-dfs.sh

3.启动spark

spark-2.3.1-bin-without-hadoop/sbin/start-all.sh

4.分别启动host-02和host-03上的master(host-01 的是ALIVE,其他是STANDBY状态,host-01挂了其他的zookeeper自动选一台alive替代)

spark-2.3.1-bin-without-hadoop/sbin/start-master.sh

jps命令查看

host-01:

[root@localhost home]# jps
26304 NameNode
24310 QuorumPeerMain
30152 Jps
29946 Master
26622 SecondaryNameNode

 host-02:

[root@localhost home]# jps
13857 DataNode
15938 Master
16118 Jps
15752 Worker
12767 QuorumPeerMain

 host-03

[root@localhost home]# jps
3186 QuorumPeerMain
14323 Master
6100 DataNode
15966 Jps

相关推荐