hadoop(4)Cluster Setup on hadoop 2.4.1
hadoop(4)Cluster Setup on hadoop 2.4.1
1. Virtual Machine
Delete Ubuntu User
>userdel -r carl
List the Users
>users
Add User
>useradd carl
Change the computer name
>sudo vi /etc/hostname
Upgrade the Clients Machines
>apt-get update
>apt-get dist-upgrade
2. Install Environments on Clients
Install Java
>sudo add-apt-repository ppa:webupd8team/java
>sudo apt-get update
>sudo apt-get install oracle-java6-installer
Check java environment
>java -version
java version "1.6.0_45"
Install the SSH
>sudo apt-get install ssh
>cd ~/.ssh/
>ssh-keygen -t rsa
It will generate 2 files there
>ls -l
total 12 -rw------- 1 carl carl 1675 Jun 30 22:45 id_rsa -rw-r--r-- 1 carl carl 399 Jun 30 22:45 id_rsa.pub
>cat id_rsa.pub >> authorized_keys
Restart the ssh server after add the key to the authorized_keys
>sudo service ssh restart
Diable the Firewall
>sudo ufw disable
Firewall stopped and disabled on system startup
Prepare protoc
>sudo apt-get install protobuf-compiler
>protoc --version
libprotoc 2.5.0
Download the Source
>wget http://apache.arvixe.com/hadoop/common/hadoop-2.4.1/hadoop-2.4.1-src.tar.gz
Uzip the file and prepare to build with maven
>mvn package -Pdist -DskipTests -Dtar
Get the hadoop-2.4.1.tar.gz file and unzip and put it in working directory. Add to environment
export HADOOP_PREFIX=/opt/hadoop
export JAVA_HOME=/usr/lib/jvm/java-6-oracle
Testing with Standalone Operation, Everything works fine.
Pseudo Distributed Mode
Almost the same, follow this blog http://sillycat.iteye.com/blog/2084169
Run MapReduce Job on YARN
Follow this blog again, http://sillycat.iteye.com/blog/2084169
3. Set up the Hadoop Cluster
Change the name of the machine
>sudo vi /etc/hostname
change ubuntu140401 to ubuntu-master
change ubuntu140402 to ubuntu-client1
Add these kind of things in hosts, making each server knows about each other.
127.0.0.1 ubuntu-client1
10.190.191.242 ubuntu-master
Configuration File on Client1 - core-site.xml
>cat /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/temp</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
Configuration file on client1 - hdfs-site.xml
>cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
Configuration file on client1 - mapped-site.xml
>cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
>cat etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ubuntu-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ubuntu-master:19888</value>
</property>
</configuration>
Configuration file on client1 - yarn-site.xml
>cat etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>ubuntu-master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ubuntu-master:8088</value>
</property>
</configuration>
Clone 2 more clients, Then I have 3 slaves machines ubuntu-client1, ubuntu-client2, ubuntu-client3. 1 master machine, ubuntu-master
Prepare the SSH connection
>cd ~/.ssh/
>cp id_rsa.pub ~/download/
scp this file to all the clients.
>cat id_rsa.pub >> ./authorized_keys
Then ubuntu-master can access to all the clients.
Start the HDFS cluster on ubuntu-master
>sbin/start-dfs.sh
Visit this URL http://ubuntu-master:50070/dfshealth.html#tab-overview
Start the YARN cluster on ubuntu-master
>sbin/start-yarn.sh
Visit the resource manager
http://ubuntu-master:8088/cluster
Node Manager on all the clients
http://ubuntu-client1:8042/node/node
Start the JobHistory Server
>sbin/mr-jobhistory-daemon.sh start historyserver
http://ubuntu-master:19888/jobhistory
Verify with Word count
create directory on HDFS
>hadoop fs -mkdir -p /data/worldcount
>hadoop fs -mkdir -p /output/
Put the xml files there
>hadoop fs -put /opt/hadoop/etc/hadoop/*.xml /data/worldcount/
>hadoop fs -ls /data/worldcount
14/07/09 13:25:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 7 items -rw-r--r-- 3 carl supergroup 3589 2014-07-09 13:24 /data/worldcount/capacity-scheduler.xml -rw-r--r-- 3 carl supergroup 1250 2014-07-09 13:24 /data/worldcount/core-site.xml -rw-r--r-- 3 carl supergroup 9257 2014-07-09 13:24 /data/worldcount/hadoop-policy.xml -rw-r--r-- 3 carl supergroup 1286 2014-07-09 13:24 /data/worldcount/hdfs-site.xml -rw-r--r-- 3 carl supergroup 620 2014-07-09 13:24 /data/worldcount/httpfs-site.xml -rw-r--r-- 3 carl supergroup 1063 2014-07-09 13:24 /data/worldcount/mapred-site.xml -rw-r--r-- 3 carl supergroup 1456 2014-07-09 13:24 /data/worldcount/yarn-site.xml
>hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /data/worldcount /output/worldcount
Successfully get the results
>hadoop fs -cat /output/worldcount/*
"*" 17 "AS 7 "License”); 7 "alice,bob 17 (ASF) 1 (root 1
References:
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
http://hadoop.apache.org/docs/r2.4.0/hadoop-yarn/hadoop-yarn-site/YARN.html
http://blog.sina.com.cn/s/blog_5c5d5cdf0101dvgq.html
http://www.haogongju.net/art/2707216
http://blog.csdn.net/hadoop_/article/details/24196193
http://blog.csdn.net/hadoop_/article/details/17716945
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://blog.huangchaosuper.cn/work/tech/2014/04/24/hadoop-install.html