spark 2.3.1 Standalone 集群
1.先下载spark 2.3.1
下载地址:http://spark.apache.org/downloads.html
2.安装spark 2.3.1
上传到 /usr/spark 目录下
解压安装 :
tar -zxvf spark-2.3.1-bin-hadoop2.7.tgz
3.修改/etc/hosts文件如下:
vim /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.2.185 sky1
修改/etc/sysconfig/network文件如下:
vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=sky1 GATEWAY=192.168.2.1
4. 修改spark 配置文件(以4台机器为例)
conf/slaves
vim conf/slaves sky1 sky2 sky3 sky4
conf/spark-env.sh
vim conf/spark-env.sh export JAVA_HOME=/usr/java/jdk export SPARK_MASTER_HOST=sky1 export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=1 export SPARK_WORKER_MEMORY=1g
5.修改完成后,把spark cp 到其它机器
scp -r /usr/spark/spark-2.3.1-bin-hadoop2.7 root@sky2:/usr/spark
6.启动spark
启动注意关闭防火墙(service iptables stop)
./sbin/start-all.sh
其它启动命令(http://spark.apache.org/docs/latest/spark-standalone.html):
sbin/start-master.sh - Starts a master instance on the machine the script is executed on. sbin/start-slaves.sh - Starts a slave instance on each machine specified in the conf/slaves file. sbin/start-slave.sh - Starts a slave instance on the machine the script is executed on. sbin/start-all.sh - Starts both a master and a number of slaves as described above. sbin/stop-master.sh - Stops the master that was started via the sbin/start-master.sh script. sbin/stop-slaves.sh - Stops all slave instances on the machines specified in the conf/slaves file. sbin/stop-all.sh - Stops both the master and the slaves as described above.
7.查看启动情况:
http://IP:8080/ 查看spark web控制台
netstat -antlp :查看spark 端口监听情况
8. 测试(http://spark.apache.org/docs/latest/submitting-applications.html)
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://sky1:7077 examples/jars/spark-examples_2.11-2.3.1.jar 10000
其它
# Run application locally on 8 cores ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master local[8] \ /path/to/examples.jar \ 100 # Run on a Spark standalone cluster in client deploy mode ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000 # Run on a Spark standalone cluster in cluster deploy mode with supervise ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --deploy-mode cluster \ --supervise \ --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000 # Run on a YARN cluster export HADOOP_CONF_DIR=XXX ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ # can be client for client mode --executor-memory 20G \ --num-executors 50 \ /path/to/examples.jar \ 1000 # Run a Python application on a Spark standalone cluster ./bin/spark-submit \ --master spark://207.184.161.138:7077 \ examples/src/main/python/pi.py \ 1000 # Run on a Mesos cluster in cluster deploy mode with supervise ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master mesos://207.184.161.138:7077 \ --deploy-mode cluster \ --supervise \ --executor-memory 20G \ --total-executor-cores 100 \ http://path/to/examples.jar \ 1000 # Run on a Kubernetes cluster in cluster deploy mode ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master k8s://xx.yy.zz.ww:443 \ --deploy-mode cluster \ --executor-memory 20G \ --num-executors 50 \ http://path/to/examples.jar \ 1000