Spark(6)Upgrade to 1.0.2 Version again with YARN

沧海一滴水

2014-08-13

Spark(6)Upgrade to 1.0.2 Version again with YARN

Download the prebuilt version
>wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz

Setup and Make hadoop2 running on my VMs
http://sillycat.iteye.com/blog/2090186

Prepare the file on hadoop
>hdfs dfs -mkdir /user/sillycat
>hdfs dfs -put /opt/spark/log.txt /user/sillycat/

Login on the shell.
>MASTER=spark://ubuntu-master1:7077 bin/spark-shell
>val file = sc.textFile("hdfs://ubuntu-master1:9000/user/sillycat/log.txt")
>file.first()

Error Message:
Server IPC version 9 cannot communicate with client version 4

Solution:
Version error, I am using spark-hadoop1 to connect to hadoop 2.4.1

It works in the shell.

Go on and configure the YARN.
>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver

YARN is running now. Then we can get info from these URL
http://ubuntu-master1:50070/dfshealth.html#tab-overview
http://ubuntu-master1:8088/cluster/nodes
http://ubuntu-master1:19888/jobhistory

Running Spark Shell on YARN
Change the configuration file of spark
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

>MASTER=yarn-client bin/spark-shell

Submit the task as follow
>bin/spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master yarn /Users/carl/work/current/simplesparkapp/target/sparkwordcount-0.0.1-SNAPSHOT.jar hdfs://ubuntu-master1:9000/user/sillycat/log.txt 2

Spark works great on YARN
>bin/spark-submit --class com.sillycat.spark.app.ClusterComplexJob --master yarn /Users/carl/work/sillycat/sillycat-spark/target/scala-2.10/sillycat-spark-assembly-1.0.jar book1

The sample project is in sillycat-spark

Even the standalone cluster is working
>bin/spark-submit --class com.sillycat.spark.app.FindWordJob --master spark://ubuntu-master1:7077 /Users/carl/work/sillycat/sillycat-spark/target/scala-2.10/sillycat-spark-assembly-1.0.jar book1

The command to start the master and slave
>sbin/start-master.sh
>bin/spark-class org.apache.spark.deploy.worker.Worker spark://ubuntu-master1:7077

Configuration on master1
>cat conf/spark-env.sh
#!/usr/bin/env bash

export SPARK_LOCAL_IP=ubuntu-master1

#export SPARK_EXECUTOR_MEMORY=1G

export SPARK_MASTER_IP=ubuntu-master1

export SPARK_WORKER_MEMORY=1024M

Configuration on slave1
>cat conf/spark-env.sh
#!/usr/bin/env bash

export SPARK_LOCAL_IP=ubuntu-slave1

#export SPARK_EXECUTOR_MEMORY=1G

export SPARK_MASTER_IP=ubuntu-master1

export SPARK_WORKER_MEMORY=1024M

Tips
Spark Job runs
>bin/spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local /Users/carl/work/current/simplesparkapp/target/sparkwordcount-0.0.1-SNAPSHOT.jar /opt/spark/README.md 2

Error Message
java.lang.OutOfMemoryError: Java heap space

Solution:
Change the Memory configuration from here
>vi bin/spark-class

References:
http://spark.apache.org/docs/latest/running-on-yarn.html
https://github.com/snowplow/spark-example-project

http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
http://parambirs.wordpress.com/2014/05/20/running-spark-1-0-0-snapshot-on-hadoopyarn-2-4-0/
http://parambirs.wordpress.com/2014/05/20/install-hadoopyarn-2-4-0-on-ubuntu-virtualbox/
http://parambirs.wordpress.com/2014/05/20/building-and-running-spark-1-0-0-snapshot-on-ubuntu/

yarn spark again hdfs

安科网

Spark(6)Upgrade to 1.0.2 Version again with YARN

沧海一滴水

沧海一滴水

相关推荐

Spark Yarn部署时注意点

hadoop框架三大组件hdfs、mapreduce、yarn 内容

Hadoop

大数据 Hadoop

搭建HDFS集群和Yarn集群

在Ubuntu和其他Linux发行版上使用Yarn

三、大数据组件之Yarn

yarn 查看资源 core 内存

第一个Vue页面

Hadoop Yarn工作机制 Job提交流程

Hive llap服务安装说明及测试（二）

Vue开发中cnpm,yarn,npm,nodejs 区别与关系

Yarn架构

使用Taro实现小程序商城的购物车功能模块的实例代码

[email protected]配合antd UI使用，自定义主题

Yarn 和 NPM 国内快速镜像（淘宝镜像）

记学习react-native

如何在Ubuntu 20.04上安装Yarn

查看npm和yarn 的镜像源和配置淘宝镜像源

create-react-app + Typescript脚手架搭建

沧海一滴水