Spark(6)Upgrade to 1.0.2 Version again with YARN

Spark(6)Upgrade to 1.0.2 Version again with YARN

Download the prebuilt version 
>wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz

Setup and Make hadoop2 running on my VMs
http://sillycat.iteye.com/blog/2090186

Prepare the file on hadoop
>hdfs dfs -mkdir /user/sillycat
>hdfs dfs -put /opt/spark/log.txt /user/sillycat/

Login on the shell.
>MASTER=spark://ubuntu-master1:7077 bin/spark-shell
>val file = sc.textFile("hdfs://ubuntu-master1:9000/user/sillycat/log.txt")
>file.first()

Error Message:
Server IPC version 9 cannot communicate with client version 4

Solution:
Version error, I am using spark-hadoop1 to connect to hadoop 2.4.1

It works in the shell.

Go on and configure the YARN.
>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver

YARN is running now. Then we can get info from these URL
http://ubuntu-master1:50070/dfshealth.html#tab-overview
http://ubuntu-master1:8088/cluster/nodes
http://ubuntu-master1:19888/jobhistory

Running Spark Shell on YARN
Change the configuration file of spark
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

>MASTER=yarn-client bin/spark-shell

Submit the task as follow
>bin/spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master yarn /Users/carl/work/current/simplesparkapp/target/sparkwordcount-0.0.1-SNAPSHOT.jar hdfs://ubuntu-master1:9000/user/sillycat/log.txt 2

Spark works great on YARN
>bin/spark-submit --class com.sillycat.spark.app.ClusterComplexJob --master yarn /Users/carl/work/sillycat/sillycat-spark/target/scala-2.10/sillycat-spark-assembly-1.0.jar book1

The sample project is in sillycat-spark

Even the standalone cluster is working
>bin/spark-submit --class com.sillycat.spark.app.FindWordJob --master spark://ubuntu-master1:7077 /Users/carl/work/sillycat/sillycat-spark/target/scala-2.10/sillycat-spark-assembly-1.0.jar book1

The command to start the master and slave
>sbin/start-master.sh 
>bin/spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu-master1:7077

Configuration on master1
>cat conf/spark-env.sh
#!/usr/bin/env bash

export SPARK_LOCAL_IP=ubuntu-master1

#export SPARK_EXECUTOR_MEMORY=1G

export SPARK_MASTER_IP=ubuntu-master1

export SPARK_WORKER_MEMORY=1024M

Configuration on slave1
>cat conf/spark-env.sh
#!/usr/bin/env bash

export SPARK_LOCAL_IP=ubuntu-slave1

#export SPARK_EXECUTOR_MEMORY=1G

export SPARK_MASTER_IP=ubuntu-master1

export SPARK_WORKER_MEMORY=1024M

Tips
Spark Job runs
>bin/spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local /Users/carl/work/current/simplesparkapp/target/sparkwordcount-0.0.1-SNAPSHOT.jar /opt/spark/README.md 2

Error Message
java.lang.OutOfMemoryError: Java heap space

Solution:
Change the Memory configuration from here
>vi bin/spark-class


References:
http://spark.apache.org/docs/latest/running-on-yarn.html
https://github.com/snowplow/spark-example-project

http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
http://parambirs.wordpress.com/2014/05/20/running-spark-1-0-0-snapshot-on-hadoopyarn-2-4-0/
http://parambirs.wordpress.com/2014/05/20/install-hadoopyarn-2-4-0-on-ubuntu-virtualbox/
http://parambirs.wordpress.com/2014/05/20/building-and-running-spark-1-0-0-snapshot-on-ubuntu/

相关推荐