大数据系列10:Spark – 内存计算
tar -zxvf spark-0.8.0-incubating-bin-hadoop1.tgz
mv spark-0.8.0-incubating-bin-hadoop1 spark-0.8.0
wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
tar -zxvf scala-2.9.3.tgz
sudo vi /etc/profile
增加:
export SCALA_HOME=/home/ysc/scala-2.9.3
export PATH=$PATH:$SCALA_HOME/bin
source /etc/profile
cd spark-0.8.0(spark命令和hadoop命令重名,不加入path)
cp conf/spark-env.sh.template conf/spark-env.sh
vi conf/slaves
修改localhost为host001
vi conf/spark-env.sh
增加:
JAVA_HOME=/home/ysc/jdk1.7.0_40
SCALA_HOME=/home/ysc/scala-2.9.3
SPARK_WORKER_INSTANCES=2
启动服务:
bin/start-all.sh
WEB界面:
Spark Master :http://host001:8080/
Spark Worker :http://host001:8081/
运行例子:
集群运算:
./run-example org.apache.spark.examples.JavaSparkPi spark://host001:7077
./run-example org.apache.spark.examples.JavaWordCount spark://host001:7077 README.md
本地运算:
./run-example org.apache.spark.examples.JavaSparkPi local[4] (4代表线程数目)
./run-example org.apache.spark.examples.JavaWordCount local[4]README.md
停止服务:
bin/stop-all.sh
Spark相关框架研究交流群,如:Apache Spark、Spark SQL、Spark Streaming、MLlib、GraphX等,有兴趣的请加Q群:182304757