Hadoop-分布式安装配置
基础环境
3台linux环境机器,本文采用3个VMWare做的虚拟机安装linuxAS5,本文采用vmware的NAT方式规划IP
分别为:机器名 | IP | 说明 |
Hadoop00 | 192.168.91.10 | Master, nameNode, SecondaryNamenode, jobTracker |
Hadoop01 | 192.168.91.11 | Slave,dataNode, tasktracker |
Hadoop02 | 192.168.91.12 | Slave,dataNode, tasktracker |
在三台机器中配置好IP和HOST
/etc/hosts中添加
192.168.91.10hadoop00
192.168.91.11hadoop01
192.168.91.12 hadoop02用户准备
创建hadoop运行的专用用户和组,这里我使用hadoop作为用户名和组名。在三台机器分别建立用户和组。
groupaddhadoop
useradd –g hadoop –G hadoop hadoop配置密钥方式免密码登录因为hadoop需要nameNode能无密码方式登录和访问各个dataNode,所以要配置操作系统hadoop运行用户的密钥方式无密码登录。
只需要在nameNode(hadoop00, 192.168.91.10)配置免密钥登录其它dataNade。在nameNode中生成公私钥对,然后把公钥发送到各个dataNode。
在nameNode上操作:
[hadoop@hadoop00~]$ssh-keygen-tdsa-P''-f~/.ssh/id_dsa
直接回车,完成后会在~/.ssh/生成两个文件:id_dsa和id_dsa.pub。这两个是成对出现,类似钥匙和锁。再把id_dsa.pub追加到授权key里面(当前并没有authorized_keys文件):
[hadoop@hadoop00~]$cat~/.ssh/id_dsa.pub>>~/.ssh/authorized_keys
注意:需要修改.ssh和authorized_keys的访问权限,否则可能无法正常登录
[hadoop@hadoop00~]$chmod700~/.ssh
[hadoop@hadoop00~]$chmod600~/.ssh/authorized_keys
测试本机无密码登录
[hadoop@hadoop00 ~]$ ssh localhost拷贝公钥id_dsa.pub到各dataNode
[hadoop@hadoop00~]$scp~/.ssh/id_dsa.pubhadoop@hadoop01:/home/hadoop/
[hadoop@hadoop00~]$scp~/.ssh/id_dsa.pubhadoop@hadoop02:/home/hadoop/
分别登录各个dataNode,追加公钥id_dsa.pub到dataNode的authorized_keys中[hadoop@hadoop01 ~] mkdir .ssh
[hadoop@hadoop01~]chmod700.ssh
[hadoop@hadoop01~]catid_dsa.pub>>.ssh/authorized_keys
[hadoop@hadoop01~]chmod600.ssh/authorized_keys
测试nameNode无密码访问dataNode
[hadoop@hadoop00~]sshhadoop01
Last login: Thu Sep 22 07:57:07 2011 from hadoop00安装配置环境变量
下载安装hadoop-0.21.0
http://mirror.bjtu.edu.cn/apache/hadoop/common/hadoop-0.21.0/hadoop-0.21.0.tar.gz
下载JDK版本:jdk-6u24-linux-i586.binHadoop下载后直接就要到hadoop的用户主目录
[hadoop@hadoop00~]cd/home/hadoop
[hadoop@hadoop00~]tar–xzvfhadoop-0.21.0.tar.gz
待配置完成后,直接拷贝到各个dataNodeJDK的安装配置,安装就免了,配置环境变量如下(master和各slave配置相同)
vi~/.bash_profile
在文件结尾加入:
#javaenv
exportJAVA_HOME=/usr/java/jdk1.6.0_24
exportJRE_HOME=$JAVA_HOME/jre
exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
exportCLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
#hadoopenv
exportHADOOP_HOME=/home/hadoop/hadoop-0.21.0
export PATH=$HADOOP_HOME/bin:$PATH配置hadoop
配置nameNode的hadoop
1.配置hadoop环境shell文件:hadoop-0.21.0/conf/hadoop-env.sh
#Thejavaimplementationtouse.Required.
export JAVA_HOME=/usr/java/jdk1.6.0_242.配置:hadoop-0.21.0/conf/core-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoopdata</value> </property> <property> <name>fs.default.name</name> <value>hdfs://hadoop00:9000</value> </property> <property> <name>dfs.hosts.exclude</name> <value>excludes</value> </property> </configuration>
3.配置:hadoop-0.21.0/conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/home/hadoop/hadoopname</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hadoop/hadoopdata</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
4.配置:hadoop-0.21.0/conf/mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>hadoop00:9001</value> </property> </configuration>
拷贝nameNode配置好的hadoop到各个dataNode相同的目录
[hadoop@hadoop00~]zip-rhadoop-0.21.0.ziphadoop-0.21.0
[hadoop@hadoop00~]scphadoop-0.21.0.ziphadoop@hadoop01:/home/hadoop
[hadoop@hadoop00~]scphadoop-0.21.0.ziphadoop@hadoop02:/home/hadoop
分别登录两台dataNode,直接解压hadoop-0.21.0.zip
[hadoop@hadoop01~]unziphadoop-0.21.0.zip
[hadoop@hadoop02 ~] unzip hadoop-0.21.0.zip启动和停止hadoop
Hadoop直接在nameNode上运行命令启动,nameNode会负责自动连接,启动和停止所有的dataNode.
1.启动
[hadoop@hadoop00~]$~/hadoop-0.21.0/bin/start-all.sh
ThisscriptisDeprecated.Insteadusestart-dfs.shandstart-mapred.sh
startingnamenode,loggingto/home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-namenode-hadoop00.out
192.168.91.11:startingdatanode,loggingto/home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-datanode-hadoop01.out
192.168.91.12:startingdatanode,loggingto/home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-datanode-hadoop02.out
192.168.91.10:startingsecondarynamenode,loggingto/home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop00.out
startingjobtracker,loggingto/home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-jobtracker-hadoop00.out
192.168.91.12:startingtasktracker,loggingto/home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-tasktracker-hadoop02.out
192.168.91.11: starting tasktracker, logging to /home/hadoop/hadoop-0.21.0/bin/../logs/hadoop-hadoop-tasktracker-hadoop01.out2.停止
[hadoop@hadoop00~]$~/hadoop-0.21.0/bin/stop-all.sh
ThisscriptisDeprecated.Insteadusestop-dfs.shandstop-mapred.sh
stoppingnamenode
192.168.91.12:stoppingdatanode
192.168.91.11:stoppingdatanode
192.168.91.10:stoppingsecondarynamenode
stoppingjobtracker
192.168.91.11:stoppingtasktracker
192.168.91.12: stopping tasktracker初始配置HDFS
1、格式化HDFS文件系统
[hadoop@hadoop00~]$hadoopnamenode-format
2、查看HDFS
[hadoop@hadoop00~]$hadoopfs-ls/
11/09/2407:49:55INFOsecurity.Groups:Groupmappingimpl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;cacheTimeout=300000
11/09/2407:49:56WARNconf.Configuration:mapred.task.idisdeprecated.Instead,usemapreduce.task.attempt.id
Found4items
drwxr-xr-x-hadoopsupergroup02011-09-2208:05/home
drwxr-xr-x-hadoopsupergroup02011-09-2211:29/jobtracker
drwxr-xr-x-hadoopsupergroup02011-09-2211:23/user
3、通过WEB查看hadoop
查看集群状态 http://192.168.91.10:50070/dfshealth.jsp查看JOB状态 http://192.168.91.10:50030/jobtracker.jsp
运行hadoop的example-wordcount
Wordcount程序是一个简单的计算输入文件中每个单词出现的次数,并输出到指定的目录下。该程序是官方的例子,在hadoop-0.21.0安装目录下的:hadoop-mapred-examples-0.21.0.jar在hdfs上建立程序的输入目录和文件,同时建立程序的输出目录.
[hadoop@hadoop00~]$mkdirinput
[hadoop@hadoop00~]$cataaaaabbbccccccccc111>input/file
[hadoop@hadoop00~]$hadoopfs–mkdir/wordcount
[hadoop@hadoop00 ~]$ hadoop fs –put input /wordcount[hadoop@hadoop00 ~]$ hadoop jar hadoop-0.21.0/hadoop-mapred-examples-0.21.0.jar wordcount /wordcount/input /wordcount/output
11/09/2408:11:25INFOsecurity.Groups:Groupmappingimpl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;cacheTimeout=300000
11/09/2408:11:26WARNconf.Configuration:mapred.task.idisdeprecated.Instead,usemapreduce.task.attempt.id
11/09/2408:11:26WARNmapreduce.JobSubmitter:UseGenericOptionsParserforparsingthearguments.ApplicationsshouldimplementToolforthesame.
11/09/2408:11:26INFOinput.FileInputFormat:Totalinputpathstoprocess:2
11/09/2408:11:26WARNconf.Configuration:mapred.map.tasksisdeprecated.Instead,usemapreduce.job.maps
11/09/2408:11:26INFOmapreduce.JobSubmitter:numberofsplits:2
11/09/2408:11:27INFOmapreduce.JobSubmitter:addingthefollowingnamenodes'delegationtokens:null
11/09/2408:11:27INFOmapreduce.Job:Runningjob:job_201109240745_0002
11/09/2408:11:28INFOmapreduce.Job:map0%reduce0%
11/09/2408:11:44INFOmapreduce.Job:map50%reduce0%
11/09/2408:11:50INFOmapreduce.Job:map100%reduce0%
11/09/2408:11:57INFOmapreduce.Job:map100%reduce100%
11/09/2408:11:59INFOmapreduce.Job:Jobcomplete:job_201109240745_0002
11/09/2408:11:59INFOmapreduce.Job:Counters:34
……[hadoop@hadoop00 ~]$ hadoop fs -cat /wordcount/output/part-r-00000
11/09/2408:18:09INFOsecurity.Groups:Groupmappingimpl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;cacheTimeout=300000
11/09/2408:18:09WARNconf.Configuration:mapred.task.idisdeprecated.Instead,usemapreduce.task.attempt.id
13
a5
b3
c9