Ubuntu下Hadoop单机部署及分布式集群部署
CDH4.5手动安装
http://wenku.baidu.com/view/6544c87f2e3f5727a5e962a3.html
hadoop-1.1.2.tar.gz也测试通过
重要
安装文档http://wenku.baidu.com/view/685f71b165ce050876321329.html
在选择网络连接时,选择桥接模式
设置root用户密码
打开终端ctrl+Alt+T
修改root密码sudopasswdroot
输入密码
用户root用户登录suroot
ubuntu8.10默认没有安装ssh服务,需要手动安装以后才能实现
sudoapt-getinstallssh
或sudoapt-getinstallopenssh-server//安装openssh-server
用ifconfig查看ip地址
远程用crt连接
ubuntu10.2.128.46
ubuntu110.2.128.20
ubuntu210.2.128.120
安装vim
sudoapt-getinstallvim
1、安装JDK
1.1、到官网下载相关的JDK
这里下载的是jdk-6u23-linux-i586.bin。
下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
找jdk6
放置在/home/qiaowang
sudoshjdk-6u23-linux-i586.bin
cp-rfjdk1.6.0_33//usr/lib/
sudogedit/etc/environment
exportJAVA_HOME=/usr/lib/jdk1.6.0_33
exportJRE_HOME=/usr/lib/jdk1.6.0_33/jre
exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
vim/etc/profile
exportJAVA_HOME=/usr/lib/jdk1.6.0_33
exportJRE_HOME=/usr/lib/jdk1.6.0_33/jre
exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
exportPATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$JAVA_HOME/bin
加在umak前即可
source/etc/profile
reboot
root@qiaowang-virtual-machine:/etc#java-version
javaversion"1.6.0_33"
Java(TM)SERuntimeEnvironment(build1.6.0_33-b03)
JavaHotSpot(TM)ClientVM(build20.8-b03,mixedmode)
JDK环境的操作需要在所有的namenode和datanode上面进行操作。
2、增加一个用户组用户,用于hadoop运行及访问。
sudoaddgrouphadoop
sudoadduser--ingrouphadoophadoop
查看用户所属组:
id用户名
查看组内用户:
groups用户名
查看所有用户:
cat/etc/shadow
删除用户
在root用户下:userdel-rnewuser
在普通用户下:sudouserdel-rnewuser
先退出再删除
3、生成SSH证书,配置SSH加密key
su-hadoop//切换到hadoop用户
ssh-keygen-trsa-P""//生成sshkey
cd.ssh/
cat$HOME/.ssh/id_rsa.pub>>$HOME/.ssh/authorized_keys//设置允许ssh访问
cat/home/hadoop/.ssh/id_rsa.pub>>/home/hadoop/.ssh/authorized_keys
设置完成后通过sshlocalhost测试一下。
把#去掉即可,系统就能通过authorized_keys来识别公钥了
4、下载hadoop发行版,地址:
http://hadoop.apache.org/common/releases.html#Download
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-0.20.2/
最新版本
hadoop-2.0.0-cdh4.5.0.tar.gz
已拷贝到opt
tar-zxvfhadoop-0.20.2.tar.gz
tar-zxvfhadoop-2.0.0-cdh4.5.0.tar.gz
5、修改主机名qiaowang-virtual-machine
root@qiaowang-virtual-machine:/opt#hostname
qiaowang-virtual-machine
假定我们发现我们的机器的主机名不是我们想要的,通过对"/etc/sysconfig/network"文件修改其中"HOSTNAME"后面的值,改成我们规划的名称。
vim/etc/hostname
Master.Hadoop
执行
hostnamem1hadoop.focus.cn
root@Master:~#hostname
Master.Hadoop
vim/etc/hosts
127.0.1.1Master.Hadoop
后面的配置
参考
http://wenku.baidu.com/view/6544c87f2e3f5727a5e962a3.html
1、core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://m1hadoop.xingmeng.com:8020</value>
<final>true</final>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tempdata</value>
</property>
2、yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>m1hadoop.xingmeng.com:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>m1hadoop.xingmeng.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>m1hadoop.xingmeng.com:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>m1hadoop.xingmeng.com:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>m1hadoop.xingmeng.com:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
3、mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/home/hadoop/mapred_system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/home/hadoop/mapred_local</value>
<final>true</final>
</property>
4、hdfs-site.xml
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
./hdfsnamenode-format
/home/hadoop/cdh/sbin/
/home/hadoop/cdh/bin/hadoopfs-ls/user/hadoop
如果ip有变化请先修改/etc/host
datanode没起来
删除/home/hadoop/data/下的文件
--------------------------------------------------
4.关掉ipv6
修改hadoop根目录下conf/hadoop-env.sh文件(还没下载hadoop的下载解压先~)
exportHADOOP_OPTS=-Djava.net.preferIPv4Stack=true
cat/proc/sys/net/ipv6/conf/all/disable_ipv6
为0
备选情况:为1是成功,应使用以下方式
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1、
5、将hadoop目录所有者更改为hadoop
chown-Rhadoop:hadoop/opt/hadoop-0.20.2/
mvhadoop-0.20.2hadoop
6.安装hadoop
下面说说如何配置和启动:
基本思路
a、配置JDK
b配置core-site.xml
cmapred-site.xml
dhdfs-site.xml
创建存放数据的目录
mkdir/opt/hadoop-datastore
打开conf/core-site.xml,配置如下
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-datastore/</value>
<description>Abaseforothertemporarydirectories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>Thenameofthedefaultfilesystem.AURIwhose
schemeandauthoritydeterminetheFileSystemimplementation.The
uri'sschemedeterminestheconfigproperty(fs.SCHEME.impl)naming
theFileSystemimplementationclass.Theuri'sauthorityisusedto
determinethehost,port,etc.forafilesystem.</description>
</property>
</configuration>
mapred-site.xml如下:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>ThehostandportthattheMapReducejobtrackerruns
at.If"local",thenjobsarerunin-processasasinglemap
andreducetask.
</description>
</property>
</configuration>
hdfs-site.xml如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Defaultblockreplication.
Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.
Thedefaultisusedifreplicationisnotspecifiedincreatetime.
</description>
</property>
</configuration>
vimhadoop-env.sh
exportJAVA_HOME=/usr/lib/jdk1.6.0_33
ok,配置完毕
格式化HDFS:
/opt/hadoop/bin/hadoopnamenode-format
输出
root@Master:/opt/hadoop#/opt/hadoop/bin/hadoopnamenode-format
12/07/1314:27:29INFOnamenode.NameNode:STARTUP_MSG:
/************************************************************
STARTUP_MSG:StartingNameNode
STARTUP_MSG:host=Master.Hadoop/127.0.1.1
STARTUP_MSG:args=[-format]
STARTUP_MSG:version=0.20.2
STARTUP_MSG:build=https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r911707;compiledby'chrisdo'onFriFeb1908:07:34UTC2010
************************************************************/
Re-formatfilesystemin/opt/hadoop-datastore/dfs/name?(YorN)y
Formatabortedin/opt/hadoop-datastore/dfs/name
12/07/1314:27:35INFOnamenode.NameNode:SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:ShuttingdownNameNodeatMaster.Hadoop/127.0.1.1
************************************************************/
启动HDFS和MapReduce
改为hadoop用户
/opt/hadoop/bin/start-all.sh
输出
startingnamenode,loggingto/opt/hadoop/bin/../logs/hadoop-root-namenode-Master.Hadoop.out
Theauthenticityofhost'localhost(127.0.0.1)'can'tbeestablished.
ECDSAkeyfingerprintis3e:55:d8:be:47:46:21:95:29:9b:9e:c5:fb:02:f4:d2.
Areyousureyouwanttocontinueconnecting(yes/no)?yes
localhost:Warning:Permanentlyadded'localhost'(ECDSA)tothelistofknownhosts.
root@localhost'spassword:
localhost:startingdatanode,loggingto/opt/hadoop/bin/../logs/hadoop-root-datanode-Master.Hadoop.out
root@localhost'spassword:
localhost:startingsecondarynamenode,loggingto/opt/hadoop/bin/../logs/hadoop-root-secondarynamenode-Master.Hadoop.out
startingjobtracker,loggingto/opt/hadoop/bin/../logs/hadoop-root-jobtracker-Master.Hadoop.out
root@localhost'spassword:
localhost:startingtasktracker,loggingto/opt/hadoop/bin/../logs/hadoop-root-tasktracker-Master.Hadoop.out
9、停止服务的脚本是:
/opt/hadoop/bin/stop-all.sh
10.启动成功后,用jps查看下。
2914NameNode
3197JobTracker
3896Jps
3024DataNode
3126SecondaryNameNode
3304TaskTracker
5.运行wordcount.java
在Hadoop所在目录里有几个jar文件,其中hadoop-examples-0.20.203.0.jar就是我们需要的,它里面含有wordcount,咱们使用命令建立测试的文件
(1)先在本地磁盘建立两个输入文件file01和file02:
$echo“HelloWorldByeWorld”>file01
$echo“HelloHadoopGoodbyeHadoop”>file02
./hadoopfs-ls/
(2)在hdfs中建立一个input目录:./hadoopfs-mkdirinput
删除./hadoopdfs-rmrinput
(3)将file01和file02拷贝到hdfs中:
./hadoopfs-copyFromLocal/home/qiaowang/file0*input
./hadoopfs-ls/user/root/input
Found2items
-rw-r--r--1rootsupergroup222012-07-1315:07/user/root/input/file01
-rw-r--r--1rootsupergroup282012-07-1315:07/user/root/input/file02
root@Master:/opt/hadoop/bin#./hadoopfs-cat/user/root/input/file01/
HelloWorldByeWorld
(4)执行wordcount:
$Hadoopjarhadoop-0.20.1-examples.jarwordcountinputoutput
$bin/hadoopjarhadoop-0.20.1-examples.jarwordcountinputoutput
Exceptioninthread"m
ain"java.io.IOException:Erroropeningjobjar:hadoop-0.20.2-examples.jar
atorg.apache.hadoop.util.RunJar.main(RunJar.java:90)
Causedby:java.util.zip.ZipException:errorinopeningzipfile
atjava.util.zip.ZipFile.open(NativeMethod)
atjava.util.zip.ZipFile.<init>(ZipFile.java:114)
atjava.util.jar.JarFile.<init>(JarFile.java:135)
atjava.util.jar.JarFile.<init>(JarFile.java:72)
atorg.apache.hadoop.util.RunJar.main(RunJar.java:88)
解决办法:
注意路径:
./hadoopjar/opt/hadoop/hadoop-0.20.2-examples.jarwordcountinputoutput
输出
12/07/1315:20:22INFOinput.FileInputFormat:Totalinputpathstoprocess:2
12/07/1315:20:22INFOmapred.JobClient:Runningjob:job_201207131429_0001
12/07/1315:20:23INFOmapred.JobClient:map0%reduce0%
12/07/1315:20:32INFOmapred.JobClient:map100%reduce0%
12/07/1315:20:44INFOmapred.JobClient:map100%reduce100%
12/07/1315:20:46INFOmapred.JobClient:Jobcomplete:job_201207131429_0001
12/07/1315:20:46INFOmapred.JobClient:Counters:17
12/07/1315:20:46INFOmapred.JobClient:JobCounters
12/07/1315:20:46INFOmapred.JobClient:Launchedreducetasks=1
12/07/1315:20:46INFOmapred.JobClient:Launchedmaptasks=2
12/07/1315:20:46INFOmapred.JobClient:Data-localmaptasks=2
12/07/1315:20:46INFOmapred.JobClient:FileSystemCounters
12/07/1315:20:46INFOmapred.JobClient:FILE_BYTES_READ=79
12/07/1315:20:46INFOmapred.JobClient:HDFS_BYTES_READ=50
12/07/1315:20:46INFOmapred.JobClient:FILE_BYTES_WRITTEN=228
12/07/1315:20:46INFOmapred.JobClient:HDFS_BYTES_WRITTEN=41
12/07/1315:20:46INFOmapred.JobClient:Map-ReduceFramework
12/07/1315:20:46INFOmapred.JobClient:Reduceinputgroups=5
12/07/1315:20:46INFOmapred.JobClient:Combineoutputrecords=6
12/07/1315:20:46INFOmapred.JobClient:Mapinputrecords=2
12/07/1315:20:46INFOmapred.JobClient:Reduceshufflebytes=45
12/07/1315:20:46INFOmapred.JobClient:Reduceoutputrecords=5
12/07/1315:20:46INFOmapred.JobClient:SpilledRecords=12
12/07/1315:20:46INFOmapred.JobClient:Mapoutputbytes=82
12/07/1315:20:46INFOmapred.JobClient:Combineinputrecords=8
12/07/1315:20:46INFOmapred.JobClient:Mapoutputrecords=8
12/07/1315:20:46INFOmapred.JobClient:Reduceinputrecords=6
(5)完成之后,查看结果:
root@Master:/opt/hadoop/bin#./hadoopfs-cat/user/root/output/part-r-00000
Bye1
GoodBye1
Hadoop2
Hello2
World2
root@Master:/opt/hadoop/bin#jps
3049TaskTracker
2582DataNode
2849JobTracker
10386Jps
2361NameNode
2785SecondaryNameNode
OK以上部分,已完成了ubuntu下单机hadoop的搭建。
--------------------------------------------------------
下面我们进行集群的搭建(3台ubuntu服务器)
参考
http://www.linuxidc.com/Linux/2011-04/35162.htm
http://www.2cto.com/os/201202/118992.html
1、三台机器:已安装jdk,添加hadoop用户
ubuntu10.2.128.46master
ubuntu110.2.128.20slave1
ubuntu210.2.128.120slave2
修改三台机器所有的/etc/hosts文件如下:
127.0.0.1localhost
10.2.128.46master.Hadoop
10.2.128.20slave1.Hadoop
10.2.128.120slave2.Hadoop
以下操作均在Hadoop用户下操作
2、生成SSH证书,配置SSH加密key
su-hadoop//切换到hadoop用户
ssh-keygen-trsa-P""//生成sshkey
cat$HOME/.ssh/id_rsa.pub>>$HOME/.ssh/authorized_keys//设置允许ssh访问
在namenode(Master)上
hadoop@Master:~/.ssh$scpauthorized_keysSlave1.Hadoop:/home/hadoop/.ssh/
hadoop@Master:~/.ssh$scpauthorized_keysSlave2.Hadoop:/home/hadoop/.ssh/
测试:sshnode2或者sshnode3(第一次需要输入yes)。
如果不须要输入密码则配置成功,如果还须要请检查上面的配置能不能正确。
hadoop@Master:~/.ssh$sshSlave1.Hadoop
WelcometoUbuntuprecise(developmentbranch)
hadoop@Master:~/.ssh$sshSlave2.Hadoop
WelcometoUbuntuprecise(developmentbranch)
2、hadoop-0.20.2.tar.gz拷贝到/home/qiaowang/install_Hadoop目录下
可采用的方法
1)安装Hadoop集群通常要将安装软件解压到集群内的所有机器上。并且安装路径要一致,如果我们用HADOOP_HOME指代安装的根路径,通常,集群里的所有机器的
HADOOP_HOME路径相同。
2)如果集群内机器的环境完全一样,可以在一台机器上配置好,然后把配置好的软件即hadoop-0.20.203整个文件夹拷贝到其他机器的相同位置即可。
3)可以将Master上的Hadoop通过scp拷贝到每一个Slave相同的目录下,同时根据每一个Slave的Java_HOME的不同修改其hadoop-env.sh。
3,相关配置
4)为了方便,使用hadoop命令或者start-all.sh等命令,修改Master上/etc/profile新增以下内容:
exportJAVA_HOME=/usr/lib/jdk1.6.0_33
exportJRE_HOME=/usr/lib/jdk1.6.0_33/jre
exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
exportHADOOP_HOME=/opt/hadoop
exportPATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
修改完毕后,执行source/etc/profile来使其生效。
配置conf下的文件:
vimhadoop-env.sh
exportJAVA_HOME=/usr/lib/jdk1.6.0_33
vimcore-site.xml
----------------------------------
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-datastore/</value>
<description>Abaseforothertemporarydirectories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://Master.Hadoop:54310</value>
<description>Thenameofthedefaultfilesystem.AURIwhose
schemeandauthoritydeterminetheFileSystemimplementation.The
uri'sschemedeterminestheconfigproperty(fs.SCHEME.impl)naming
theFileSystemimplementationclass.Theuri'sauthorityisusedto
determinethehost,port,etc.forafilesystem.</description>
</property>
</configuration>
-----------------------------------------
vimhdfs-site.xml
------------------------------------------
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Defaultblockreplication.
Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.
Thedefaultisusedifreplicationisnotspecifiedincreatetime.
</description>
</property>
</configuration>
-------------------------------------
vimmapred-site.xml
------------------------------------
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>Master.Hadoop:54311</value>
<description>ThehostandportthattheMapReducejobtrackerruns
at.If"local",thenjobsarerunin-processasasinglemap
andreducetask.
</description>
</property>
</configuration>
-------------------------------------
vimmasters
Master.Hadoop
root@Master:/opt/hadoop/conf#vimslaves
Slave1.Hadoop
Slave2.Hadoop
采用方法3,将Master上的Hadoop拷贝到每个Slave下
切换为root用户
suroot
执行scp-rhadoopSlave1.Hadoop:/opt/
在Slave1.Hadoop上
suroot
chown-Rhadoop:hadoop/opt/hadoop/
创建目录
mkdir/opt/hadoop-datastore/
chown-Rhadoop:hadoop/opt/hadoop-datastore/
同理其他Slave
在namenode执行格式化hadoop
root@Master:/opt/hadoop/bin#hadoopnamenode-format
输出:
12/07/2318:54:36INFOnamenode.NameNode:STARTUP_MSG:
/************************************************************
STARTUP_MSG:StartingNameNode
STARTUP_MSG:host=Master.Hadoop/10.2.128.46
STARTUP_MSG:args=[-format]
STARTUP_MSG:version=0.20.2
STARTUP_MSG:build=https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r911707;compiledby'chrisdo'onFriFeb1908:07:34UTC2010
************************************************************/
Re-formatfilesystemin/opt/hadoop-datastore/dfs/name?(YorN)y
Formatabortedin/opt/hadoop-datastore/dfs/name
12/07/2318:54:45INFOnamenode.NameNode:SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:ShuttingdownNameNodeatMaster.Hadoop/10.2.128.46
************************************************************/
启动hadoop
./start-all.sh
root@Master:/opt#chown-Rhadoop:hadoop/opt/hadoop/
root@Master:/opt#chown-Rhadoop:hadoop/opt/hadoop-datastore/
root@Master:/opt#suhadoop
hadoop@Master:/opt$cdhadoop/bin/
hadoop@Master:/opt/hadoop/bin$./start-all.sh
遇到的问题:
startingnamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out
Slave1.Hadoop:datanoderunningasprocess7309.Stopitfirst.
Slave2.Hadoop:datanoderunningasprocess4920.Stopitfirst.
Master.Hadoop:startingsecondarynamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out
startingjobtracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out
Slave1.Hadoop:tasktrackerrunningasprocess7477.Stopitfirst.
Slave2.Hadoop:tasktrackerrunningasprocess5088.Stopitfirst.
网上参考:
可能是楼主启动集群后,又重复格式化namenode导致的。
如果只是测试学习,可以使用如下解决方法:
1、首先kill掉26755、21863和26654几个进程。如果kill26755不行,可以kill-kill26755。
2、手动删除conf/hdfs-site.xml文件中配置的dfs.data.dir目录下的内容。
3、执行$HADOOP_HOME/bin/hadoopnamenode-format
4、启动集群$HADOOP_HOME/bin/start-all.sh
后果:
HDFS中内容会全部丢失。
解决方案:重新进行了格式化
suhadoop
hadoop@Master:/opt/hadoop/bin$./hadoopnamenode-format
12/07/2410:43:29INFOnamenode.NameNode:STARTUP_MSG:
/************************************************************
STARTUP_MSG:StartingNameNode
STARTUP_MSG:host=Master.Hadoop/10.2.128.46
STARTUP_MSG:args=[-format]
STARTUP_MSG:version=0.20.2
STARTUP_MSG:build=https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r911707;compiledby'chrisdo'onFriFeb1908:07:34UTC2010
************************************************************/
Re-formatfilesystemin/opt/hadoop-datastore/dfs/name?(YorN)y
Formatabortedin/opt/hadoop-datastore/dfs/name
12/07/2410:43:32INFOnamenode.NameNode:SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:ShuttingdownNameNodeatMaster.Hadoop/10.2.128.46
************************************************************/
hadoop@Master:/opt/hadoop/bin$./start-all.sh
startingnamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out
Slave1.Hadoop:startingdatanode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave1.Hadoop.out
Slave2.Hadoop:startingdatanode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave2.Hadoop.out
Master.Hadoop:startingsecondarynamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out
startingjobtracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out
Slave2.Hadoop:startingtasktracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave2.Hadoop.out
Slave1.Hadoop:startingtasktracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave1.Hadoop.out
hadoop@Master:/opt/hadoop/bin$
--------------------------------------------------------------------------------------------------------------------------
以下为验证部分:
hadoop@Master:/opt/hadoop/bin$./hadoopdfsadmin-report
SafemodeisON
ConfiguredCapacity:41137831936(38.31GB)
PresentCapacity:31127531520(28.99GB)
DFSRemaining:31127482368(28.99GB)
DFSUsed:49152(48KB)
DFSUsed%:0%
Underreplicatedblocks:0
Blockswithcorruptreplicas:0
Missingblocks:0
-------------------------------------------------
Datanodesavailable:2(2total,0dead)
Name:10.2.128.120:50010
DecommissionStatus:Normal
ConfiguredCapacity:20568915968(19.16GB)
DFSUsed:24576(24KB)
NonDFSUsed:4913000448(4.58GB)
DFSRemaining:15655890944(14.58GB)
DFSUsed%:0%
DFSRemaining%:76.11%
Lastcontact:TueJul2410:50:43CST2012
Name:10.2.128.20:50010
DecommissionStatus:Normal
ConfiguredCapacity:20568915968(19.16GB)
DFSUsed:24576(24KB)
NonDFSUsed:5097299968(4.75GB)
DFSRemaining:15471591424(14.41GB)
DFSUsed%:0%
DFSRemaining%:75.22%
Lastcontact:TueJul2410:50:41CST2012
web查看方式:http://10.2.128.46:50070/
查看job信息
http://10.2.128.46:50030/jobtracker.jsp
要想检查守护进程是否正在运行,可以使用jps命令(这是用于JVM进程的ps实用程序)。这个命令列出5个守护进程及其进程标识符。
hadoop@Master:/opt/hadoop/conf$jps
2823Jps
2508JobTracker
2221NameNode
2455SecondaryNameNode
netstat-nat
tcp0010.2.128.46:543110.0.0.0:*LISTEN
tcp0010.2.128.46:5431010.2.128.46:44150ESTABLISHED
tcp267010.2.128.46:5431110.2.128.120:48958ESTABLISHED
tcp0010.2.128.46:5431010.2.128.20:41230ESTABLISHED
./hadoopdfs-ls/
hadoop@Master:/opt/hadoop/bin$./hadoopdfs-ls/
Found2items
drwxr-xr-x-rootsupergroup02012-07-1315:20/opt
drwxr-xr-x-rootsupergroup02012-07-1315:20/user
hadoop@Master:/opt/hadoop$bin/hadoopfs-mkdirinput
遇到问题:
mkdir:org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannotcreatedirectory/user/hadoop/input.Namenodeisinsafemode.
那什么是Hadoop的安全模式呢?
在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式的情况下,文件系统中的内容不允许修改也不允许删除,直到安全模式结束。
安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。
运行期通过命令也可以进入安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。
现在就清楚了,那现在要解决这个问题,我想让Hadoop不处在safemode模式下,能不能不用等,直接解决呢?
答案是可以的,只要在Hadoop的目录下输入:
hadoop@Master:/opt/hadoop/bin$./hadoopdfsadmin-safemodeleave
hadoop@Master:/opt/hadoop$bin/hadoopfs-mkdirinput
hadoop@Master:/opt/hadoop/bin$cd..
hadoop@Master:/opt/hadoop$bin/hadoopfs-mkdirinput
hadoop@Master:/opt/hadoop$bin/hadoopfs-putconf/core-site.xmlinput
hadoop@Master:/opt/hadoop$bin/hadoopjarhadoop-0.20.2-examples.jargrepinputoutput'dfs[a-z.]+'
6.补充
Q:bin/hadoopjarhadoop-0.20.2-examples.jargrepinputoutput'dfs[a-z.]+'什么意思啊?
A:bin/hadoopjar(使用hadoop运行jar包)hadoop-0.20.2_examples.jar(jar包的名字)grep(要使用的类,后边的是参数)inputoutput'dfs[a-z.]+'
整个就是运行hadoop示例程序中的grep,对应的hdfs上的输入目录为input、输出目录为output。
Q:什么是grep?
A:Amap/reduceprogramthatcountsthematchesofaregexintheinput.
查看结果:
hadoop@Master:/opt/hadoop$bin/hadoopfs-ls/user/hadoop/output
Found2items
drwxr-xr-x-hadoopsupergroup02012-07-2411:29/user/hadoop/output/_logs
-rw-r--r--3hadoopsupergroup02012-07-2411:30/user/hadoop/output/part-00000
hadoop@Master:/opt/hadoop$bin/hadoopfs-rmr/user/hadoop/outputtest
Deletedhdfs://Master.Hadoop:54310/user/hadoop/outputtest
hadoop@Master:/opt/hadoop$bin/hadoopfs-rmr/user/hadoop/output
Deletedhdfs://Master.Hadoop:54310/user/hadoop/output
改用其他例子
hadoop@Master:/opt/hadoop$bin/hadoopjar/opt/hadoop/hadoop-0.20.2-examples.jarwordcountinputoutput
hadoop@Master:/opt/hadoop$bin/hadoopfs-ls/user/hadoop/output
Found2items
drwxr-xr-x-hadoopsupergroup02012-07-2411:43/user/hadoop/output/_logs
-rw-r--r--3hadoopsupergroup7722012-07-2411:43/user/hadoop/output/part-r-00000
hadoop@Master:/opt/hadoop$bin/hadoopfs-cat/user/hadoop/output/part-r-00000
(fs.SCHEME.impl)1
-->1
<!--1
</configuration>1
</property>2
<?xml1
<?xml-stylesheet1
测试成功!
重启遇到的错误
INFOipc.Client:Retryingconnecttoserver:master/192.168.0.45:54310.Alreadytried0time
./hadoopdfsadmin-report
cd/opt/hadoop-datastore/
/opt/hadoop/bin/stop-all.sh
rm-rf*
/opt/hadoop/bin/hadoopnamenode-format
如有debug设置请删除debug设置
/opt/hadoop/bin/start-all.sh
./hadoopdfsadmin-report
-------------------------------------------------------------------
HadoopmapreducejavaDemo
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.1.2</version>
</dependency>
package cn.focus.dc.hadoop; import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; /** * @author qiaowang */ public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
方式一
在linux下
创建wordcount_classes文件夹
hadoop@Master:~/wordcount_classes$ls
cnWordCount.java
hadoop@Master:~/wordcount_classes$pwd
/home/hadoop/wordcount_classes
/usr/lib/jdk1.6.0_33/bin/javac-classpath
/opt/hadoop/hadoop-core-1.1.2.jar-d
/home/hadoop/wordcount_classes/WordCount.java
编译后
hadoop@Master:~/wordcount_classes/cn/focus/dc/hadoop$pwd
/home/hadoop/wordcount_classes/cn/focus/dc/hadoop
hadoop@Master:~/wordcount_classes/cn/focus/dc/hadoop$ls
WordCount.classWordCount$Map.classWordCount$Reduce.class
打jar包
hadoop@Master:~$/usr/lib/jdk1.6.0_33/bin/jar-cvf/home/hadoop/wordcount.jar-Cwordcount_classes/.
addedmanifest
adding:cn/(in=0)(out=0)(stored0%)
adding:cn/focus/(in=0)(out=0)(stored0%)
adding:cn/focus/dc/(in=0)(out=0)(stored0%)
adding:cn/focus/dc/hadoop/(in=0)(out=0)(stored0%)
adding:cn/focus/dc/hadoop/WordCount.class(in=1573)(out=756)(deflated51%)
adding:cn/focus/dc/hadoop/WordCount$Map.class(in=1956)(out=804)(deflated58%)
adding:cn/focus/dc/hadoop/WordCount$Reduce.class(in=1629)(out=652)(deflated59%)
adding:WordCount.java(in=2080)(out=688)(deflated66%)
hadoop@Master:~$ls
file01file02hadoop-1.1.2.tar.gzwordcount_classeswordcount.jar
运行:
/opt/hadoop/bin/hadoopjar/home/hadoop/wordcount.jarcn.focus.dc.hadoop.WordCount/user/hadoop/input/user/hadoop/output
查看结果
hadoop@Master:~$/opt/hadoop/bin/hadoopfs-cat/user/hadoop/output/part-00000
Bye1
Goodbye1
Hadoop2
Hello2
World2
方式二:
在window的工程目录下直接用maven命令打包(包括依赖包)
mvn-Ucleandependency:copy-denpendenciescompilepackage
在target下获得jar包和dependency下的jar包
copy到linux下
目录结构如下:
hadoop@Master:~/hadoop_stat/dependency$ls
hadoop-core-1.1.2.jar
hadoop@Master:~/hadoop_stat$ls
dependencyhadoop-stat-1.0.0-SNAPSHOT.jar
运行:
/opt/hadoop/bin/hadoopjar/home/hadoop/hadoop_stat/hadoop-stat-1.0.0-SNAPSHOT.jarcn.focus.dc.hadoop.WordCount/user/hadoop/input/user/hadoop/output
hadoop@Master:~/hadoop_stat$/opt/hadoop/bin/hadoopfs-cat/user/hadoop/output/part-00000
Bye1
Goodbye1
Hadoop2
Hello2
World2