hadoop(4)Cluster Setup on hadoop 2.4.1

hadoop(4)Cluster Setup on hadoop 2.4.1

1. Virtual Machine

Delete Ubuntu User
>userdel -r carl

List the Users
>users

Add User
>useradd carl

Change the computer name
>sudo vi /etc/hostname

Upgrade the Clients Machines
>apt-get update
>apt-get dist-upgrade

2. Install Environments on Clients
Install Java
>sudo add-apt-repository ppa:webupd8team/java
>sudo apt-get update
>sudo apt-get install oracle-java6-installer

Check java environment
>java -version
java version "1.6.0_45"

Install the SSH 
>sudo apt-get install ssh
>cd ~/.ssh/
>ssh-keygen -t rsa

It will generate 2 files there
>ls -l
total 12 -rw------- 1 carl carl 1675 Jun 30 22:45 id_rsa -rw-r--r-- 1 carl carl  399 Jun 30 22:45 id_rsa.pub

>cat id_rsa.pub >> authorized_keys

Restart the ssh server after add the key to the authorized_keys
>sudo service ssh restart

Diable the Firewall
>sudo ufw disable
Firewall stopped and disabled on system startup

Prepare protoc
>sudo apt-get install protobuf-compiler
>protoc --version
libprotoc 2.5.0

Download the Source
>wget http://apache.arvixe.com/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
Uzip the file and prepare to build with maven

>mvn package -Pdist -DskipTests -Dtar

Get the hadoop-2.4.1.tar.gz file and unzip and put it in working directory. Add to environment
export HADOOP_PREFIX=/opt/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-oracle

Testing with Standalone Operation, Everything works fine.

Pseudo Distributed Mode
Almost the same, follow this blog http://sillycat.iteye.com/blog/2084169

Run MapReduce Job on YARN
Follow this blog again, http://sillycat.iteye.com/blog/2084169

3. Set up the Hadoop Cluster
Change the name of the machine
>sudo vi /etc/hostname

change ubuntu140401 to ubuntu-master
change ubuntu140402 to ubuntu-client1

Add these kind of things in hosts, making each server knows about each other.
127.0.0.1       ubuntu-client1

10.190.191.242  ubuntu-master

Configuration File on Client1 - core-site.xml
>cat /opt/hadoop/etc/hadoop/core-site.xml
<configuration>   

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://ubuntu-master:9000</value>   

</property>   

<property>   

  <name>io.file.buffer.size</name>

  <value>131072</value>   

</property>   

<property>

  <name>hadoop.tmp.dir</name>

  <value>file:/opt/hadoop/temp</value>   

</property>   

<property>

  <name>hadoop.proxyuser.hadoop.hosts</name>

  <value>*</value>   

</property>   

<property>

  <name>hadoop.proxyuser.hadoop.groups</name>

  <value>*</value>   

</property>

</configuration>

Configuration file on client1 - hdfs-site.xml
>cat etc/hadoop/hdfs-site.xml
<configuration>   

  <property>

    <name>dfs.namenode.secondary.http-address</name>

    <value>ubuntu-master:9001</value>   

  </property>   

  <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:/opt/hadoop/dfs/name</value>   

  </property>   

  <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:/opt/hadoop/dfs/data</value>   

  </property>   

  <property>    

    <name>dfs.replication</name>

    <value>3</value>   

  </property>   

  <property>

    <name>dfs.webhdfs.enabled</name>       

    <value>true</value>   

  </property>

</configuration>

Configuration file on client1 - mapped-site.xml
>cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
>cat etc/hadoop/mapred-site.xml
<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.address</name>

    <value>ubuntu-master:10020</value>

  </property>

  <property>

    <name>mapreduce.jobhistory.webapp.address</name>

    <value>ubuntu-master:19888</value>

  </property>

</configuration>

Configuration file on client1 - yarn-site.xml
>cat etc/hadoop/yarn-site.xml 
<configuration>

  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

  <property>

    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    

    <value>org.apache.hadoop.mapred.ShuffleHandler</value>

  </property>

  <property>

    <name>yarn.resourcemanager.address</name>

    <value>ubuntu-master:8032</value>

  </property>

  <property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>ubuntu-master:8030</value>

  </property>

  <property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>ubuntu-master:8031</value>

  </property>

  <property>

    <name>yarn.resourcemanager.admin.address</name>

    <value>ubuntu-master:8033</value>

  </property>

  <property>

    <name>yarn.resourcemanager.webapp.address</name>

    <value>ubuntu-master:8088</value>

  </property>

</configuration>

Clone 2 more clients, Then I have 3 slaves machines ubuntu-client1, ubuntu-client2, ubuntu-client3. 1 master machine, ubuntu-master

Prepare the SSH connection
>cd ~/.ssh/
>cp id_rsa.pub ~/download/

scp this file to all the clients.
>cat id_rsa.pub >> ./authorized_keys

Then ubuntu-master can access to all the clients.

Start the HDFS cluster on ubuntu-master
>sbin/start-dfs.sh
Visit this URL http://ubuntu-master:50070/dfshealth.html#tab-overview

Start the YARN cluster on ubuntu-master
>sbin/start-yarn.sh

Visit the resource manager
http://ubuntu-master:8088/cluster

Node Manager on all the clients
http://ubuntu-client1:8042/node/node

Start the JobHistory Server
>sbin/mr-jobhistory-daemon.sh start historyserver

http://ubuntu-master:19888/jobhistory

Verify with Word count
create directory on HDFS
>hadoop fs -mkdir -p /data/worldcount
>hadoop fs -mkdir -p /output/

Put the xml files there
>hadoop fs -put /opt/hadoop/etc/hadoop/*.xml /data/worldcount/
>hadoop fs -ls /data/worldcount
14/07/09 13:25:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 7 items -rw-r--r--   3 carl supergroup       3589 2014-07-09 13:24 /data/worldcount/capacity-scheduler.xml -rw-r--r--   3 carl supergroup       1250 2014-07-09 13:24 /data/worldcount/core-site.xml -rw-r--r--   3 carl supergroup       9257 2014-07-09 13:24 /data/worldcount/hadoop-policy.xml -rw-r--r--   3 carl supergroup       1286 2014-07-09 13:24 /data/worldcount/hdfs-site.xml -rw-r--r--   3 carl supergroup        620 2014-07-09 13:24 /data/worldcount/httpfs-site.xml -rw-r--r--   3 carl supergroup       1063 2014-07-09 13:24 /data/worldcount/mapred-site.xml -rw-r--r--   3 carl supergroup       1456 2014-07-09 13:24 /data/worldcount/yarn-site.xml

>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /data/worldcount /output/worldcount

Successfully get the results
>hadoop fs -cat /output/worldcount/*
"*" 17 "AS 7 "License”); 7 "alice,bob 17 (ASF) 1 (root 1


References:
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
http://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/YARN.html

http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html
http://www.haogongju.net/art/2707216
http://blog.csdn.net/hadoop_/article/details/24196193
http://blog.csdn.net/hadoop_/article/details/17716945

http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html

相关推荐