How To Install Apache Hadoop Pseudo Distributed Mode on a Single Node

搁浅记忆

2012-02-15

关注关注

http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/

1.CreateaHadoopUser

#adduserhadoop

#passwdhadoop

2.DownloadHadoopCommon

#su-hadoop

$wgethttp://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gz

MakesureJava1.6isinstalledonyoursystem.

$java-version

javaversion"1.6.0_20"

OpenJDKRuntimeEnvironment(IcedTea61.9.7)(rhel-1.39.1.9.7.el6-x86_64)

OpenJDK64-BitServerVM(build19.0-b09,mixedmode)

3.UnpackunderhadoopUser

$tarxvfzhadoop-0.20.203.0rc1.tar.gz

Thiswillcreatethe“hadoop-0.20.204.0″directory.

$ls-lhadoop-0.20.204.0

total6780

drwxr-xr-x.2hadoophadoop4096Oct1208:50bin

-rw-rw-r--.1hadoophadoop110797Aug2516:28build.xml

drwxr-xr-x.4hadoophadoop4096Aug2516:38c++

-rw-rw-r--.1hadoophadoop419532Aug2516:28CHANGES.txt

drwxr-xr-x.2hadoophadoop4096Nov205:29conf

drwxr-xr-x.14hadoophadoop4096Aug2516:28contrib

drwxr-xr-x.7hadoophadoop4096Oct1208:49docs

drwxr-xr-x.3hadoophadoop4096Aug2516:29etc

Modifythehadoop-0.20.204.0/conf/hadoop-env.shfileandmakesureJAVA_HOMEenvironmentvariableispointingtothecorrectlocationofthejavathatisinstalledonyoursystem.

$grepJAVA~/hadoop-0.20.204.0/conf/hadoop-env.sh

exportJAVA_HOME=/usr/java/jdk1.6.0_27.

4.ModifyHadoopConfigurationFiles

Addthe<configuration>sectionshownbelowtothecore-site.xmlfile.ThisindicatestheHDFSdefaultlocationandtheport.

$cat~/hadoop-0.20.204.0/conf/core-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Addthe<configuration>sectionshownbelowtothehdfs-site.xmlfile.

$cat~/hadoop-0.20.204.0/conf/hdfs-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>

Addthe<configuration>sectionshownbelowtothemapred-site.xmlfile.Thisindicatesthatthejobtrackeruses9001astheport.

$cat~/hadoop-0.20.204.0/conf/mapred-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

5.Setuppasswordlesssshtolocalhost

InatypicalHadoopproductionenvironmentyou’llbesettingupthispasswordlesssshaccessbetweenthedifferentservers.Sincewearesimulatingadistributedenvironmentonasingleserver,weneedtosetupthepasswordlesssshaccesstothelocalhostitself.

Usessh-keygentogeneratetheprivateandpublickeyvaluepair.

$ssh-keygen

Generatingpublic/privatersakeypair.

Enterfileinwhichtosavethekey(/home/hadoop/.ssh/id_rsa):

Enterpassphrase(emptyfornopassphrase):

Entersamepassphraseagain:

Youridentificationhasbeensavedin/home/hadoop/.ssh/id_rsa.

Yourpublickeyhasbeensavedin/home/hadoop/.ssh/id_rsa.pub.

Thekeyfingerprintis:

02:5a:19:ab:1e:g2:1a:11:bb:22:30:6d:12:38:a9:b1hadoop@hadoop

Thekey'srandomartimageis:

+--[RSA2048]----+

|oo|

|o+..|

|++oo|

|o.o=.|

|.+=S|

|.o.o+.|

|...o.|

|.E..|

|...|

+-----------------+

Addthepublickeytotheauthorized_keys.Justusethessh-copy-idcommand,whichwilltakecareofthisstepautomaticallyandassignappropriatepermissionstothesefiles.

$ssh-copy-id-i~/.ssh/id_rsa.publocalhost

hadoop@localhost'spassword:

Nowtryloggingintothemachine,with"ssh'localhost'",andcheckin:

.ssh/authorized_keys

tomakesurewehaven'taddedextrakeysthatyouweren'texpecting.

Testthepasswordlesslogintothelocalhostasshownbelow.

$sshlocalhost

Lastlogin:SatJan1423:01:592012fromlocalhost

6.FormatHadoopNameNode

Formatthenamenodeusingthehadoopcommandasshownbelow.You’llseethemessage“Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted”ifthiscommandworksproperly.

.HowToInstallApacheHadoopPseudoDistributedModeonaSingleNode

byRameshNatarajanonFebruary8,2012

ApacheHadoopPseudo-distributedmodeinstallationhelpsyoutosimulateamultinodeinstallationonasinglenode.Insteadofinstallinghadoopondifferentservers,youcansimulateitonasingleserver.

Beforeyoucontinue,makesureyouunderstandthehadoopfundamentals,andhavetestedthestandalonehadoopinstallation.

Ifyou’vealreadycompletedthe1stthreestepsmentionedbelowaspartofthestandlonehadoopinstallation,jumptostep4.

1.CreateaHadoopUser

Youcandownloadandinstallhadooponroot.But,itisrecommendedtoinstallitasaseparateuser.So,logintorootandcreateausercalledhadoop.

#adduserhadoop

#passwdhadoop2.DownloadHadoopCommon

DownloadtheApacheHadoopCommonandmoveittotheserverwhereyouwanttoinstallit.

Youcanalsousewgettodownloaditdirectlytoyourserverusingwget.

#su-hadoop

$wgethttp://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gzMakesureJava1.6isinstalledonyoursystem.

$java-version

javaversion"1.6.0_20"

OpenJDKRuntimeEnvironment(IcedTea61.9.7)(rhel-1.39.1.9.7.el6-x86_64)

OpenJDK64-BitServerVM(build19.0-b09,mixedmode)3.UnpackunderhadoopUser

Ashadoopuser,unpackthispackage.

$tarxvfzhadoop-0.20.203.0rc1.tar.gzThiswillcreatethe“hadoop-0.20.204.0″directory.

$ls-lhadoop-0.20.204.0

total6780

drwxr-xr-x.2hadoophadoop4096Oct1208:50bin

-rw-rw-r--.1hadoophadoop110797Aug2516:28build.xml

drwxr-xr-x.4hadoophadoop4096Aug2516:38c++

-rw-rw-r--.1hadoophadoop419532Aug2516:28CHANGES.txt

drwxr-xr-x.2hadoophadoop4096Nov205:29conf

drwxr-xr-x.14hadoophadoop4096Aug2516:28contrib

drwxr-xr-x.7hadoophadoop4096Oct1208:49docs

drwxr-xr-x.3hadoophadoop4096Aug2516:29etcModifythehadoop-0.20.204.0/conf/hadoop-env.shfileandmakesureJAVA_HOMEenvironmentvariableispointingtothecorrectlocationofthejavathatisinstalledonyoursystem.

$grepJAVA~/hadoop-0.20.204.0/conf/hadoop-env.sh

exportJAVA_HOME=/usr/java/jdk1.6.0_27Afterthisstep,hadoopwillbeinstalledunder/home/hadoop/hadoop-0.20.204.0directory.

4.ModifyHadoopConfigurationFiles

Addthe<configuration>sectionshownbelowtothecore-site.xmlfile.ThisindicatestheHDFSdefaultlocationandtheport.

$cat~/hadoop-0.20.204.0/conf/core-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>Addthe<configuration>sectionshownbelowtothehdfs-site.xmlfile.

$cat~/hadoop-0.20.204.0/conf/hdfs-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>Addthe<configuration>sectionshownbelowtothemapred-site.xmlfile.Thisindicatesthatthejobtrackeruses9001astheport.

$cat~/hadoop-0.20.204.0/conf/mapred-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>5.Setuppasswordlesssshtolocalhost

Usessh-keygentogeneratetheprivateandpublickeyvaluepair.

$ssh-keygen

Generatingpublic/privatersakeypair.

Enterfileinwhichtosavethekey(/home/hadoop/.ssh/id_rsa):

Enterpassphrase(emptyfornopassphrase):

Entersamepassphraseagain:

Youridentificationhasbeensavedin/home/hadoop/.ssh/id_rsa.

Yourpublickeyhasbeensavedin/home/hadoop/.ssh/id_rsa.pub.

Thekeyfingerprintis:

02:5a:19:ab:1e:g2:1a:11:bb:22:30:6d:12:38:a9:b1hadoop@hadoop

Thekey'srandomartimageis:

+--[RSA2048]----+

|oo|

|o+..|

|++oo|

|o.o=.|

|.+=S|

|.o.o+.|

|...o.|

|.E..|

|...|

+-----------------+Addthepublickeytotheauthorized_keys.Justusethessh-copy-idcommand,whichwilltakecareofthisstepautomaticallyandassignappropriatepermissionstothesefiles.

$ssh-copy-id-i~/.ssh/id_rsa.publocalhost

hadoop@localhost'spassword:

Nowtryloggingintothemachine,with"ssh'localhost'",andcheckin:

.ssh/authorized_keys

tomakesurewehaven'taddedextrakeysthatyouweren'texpecting.Testthepasswordlesslogintothelocalhostasshownbelow.

$sshlocalhost

Lastlogin:SatJan1423:01:592012fromlocalhostFormoredetailsonthis,read3StepstoPerformSSHLoginWithoutPasswordUsingssh-keygen&ssh-copy-id

6.FormatHadoopNameNode

Formatthenamenodeusingthehadoopcommandasshownbelow.You’llseethemessage“Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted”ifthiscommandworksproperly.

$cd~/hadoop-0.20.204.0

$bin/hadoopnamenode-format

12/01/1423:02:27INFOnamenode.NameNode:STARTUP_MSG:

/************************************************************

STARTUP_MSG:StartingNameNode

STARTUP_MSG:host=hadoop/127.0.0.1

STARTUP_MSG:args=[-format]

STARTUP_MSG:version=0.20.204.0

STARTUP_MSG:build=git://hrt8n35.cc1.ygridcore.net/onbranchbranch-0.20-security-204-r65e258bf0813ac2b15bb4c954660eaf9e8fba141;compiledby'hortonow'onThuAug2523:35:31UTC2011

************************************************************/

12/01/1423:02:27INFOutil.GSet:VMtype=64-bit

12/01/1423:02:27INFOutil.GSet:2%maxmemory=17.77875MB

12/01/1423:02:27INFOutil.GSet:capacity=2^21=2097152entries

12/01/1423:02:27INFOutil.GSet:recommended=2097152,actual=2097152

12/01/1423:02:27INFOnamenode.FSNamesystem:fsOwner=hadoop

12/01/1423:02:27INFOnamenode.FSNamesystem:supergroup=supergroup

12/01/1423:02:27INFOnamenode.FSNamesystem:isPermissionEnabled=true

12/01/1423:02:27INFOnamenode.FSNamesystem:dfs.block.invalidate.limit=100

12/01/1423:02:27INFOnamenode.FSNamesystem:isAccessTokenEnabled=falseaccessKeyUpdateInterval=0min(s),accessTokenLifetime=0min(s)

12/01/1423:02:27INFOnamenode.NameNode:Cachingfilenamesoccuringmorethan10times

12/01/1423:02:27INFOcommon.Storage:Imagefileofsize112savedin0seconds.

12/01/1423:02:27INFOcommon.Storage:Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted.

12/01/1423:02:27INFOnamenode.NameNode:SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:ShuttingdownNameNodeathadoop/127.0.0.1

************************************************************/

7.StartAllHadoopRelatedServices

.HowToInstallApacheHadoopPseudoDistributedModeonaSingleNode

byRameshNatarajanonFebruary8,2012

ApacheHadoopPseudo-distributedmodeinstallationhelpsyoutosimulateamultinodeinstallationonasinglenode.Insteadofinstallinghadoopondifferentservers,youcansimulateitonasingleserver.

Beforeyoucontinue,makesureyouunderstandthehadoopfundamentals,andhavetestedthestandalonehadoopinstallation.

Ifyou’vealreadycompletedthe1stthreestepsmentionedbelowaspartofthestandlonehadoopinstallation,jumptostep4.

1.CreateaHadoopUser

Youcandownloadandinstallhadooponroot.But,itisrecommendedtoinstallitasaseparateuser.So,logintorootandcreateausercalledhadoop.

#adduserhadoop

#passwdhadoop2.DownloadHadoopCommon

DownloadtheApacheHadoopCommonandmoveittotheserverwhereyouwanttoinstallit.

Youcanalsousewgettodownloaditdirectlytoyourserverusingwget.

#su-hadoop

$wgethttp://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gzMakesureJava1.6isinstalledonyoursystem.

$java-version

javaversion"1.6.0_20"

OpenJDKRuntimeEnvironment(IcedTea61.9.7)(rhel-1.39.1.9.7.el6-x86_64)

OpenJDK64-BitServerVM(build19.0-b09,mixedmode)3.UnpackunderhadoopUser

Ashadoopuser,unpackthispackage.

$tarxvfzhadoop-0.20.203.0rc1.tar.gzThiswillcreatethe“hadoop-0.20.204.0″directory.

$ls-lhadoop-0.20.204.0

total6780

drwxr-xr-x.2hadoophadoop4096Oct1208:50bin

-rw-rw-r--.1hadoophadoop110797Aug2516:28build.xml

drwxr-xr-x.4hadoophadoop4096Aug2516:38c++

-rw-rw-r--.1hadoophadoop419532Aug2516:28CHANGES.txt

drwxr-xr-x.2hadoophadoop4096Nov205:29conf

drwxr-xr-x.14hadoophadoop4096Aug2516:28contrib

drwxr-xr-x.7hadoophadoop4096Oct1208:49docs

drwxr-xr-x.3hadoophadoop4096Aug2516:29etcModifythehadoop-0.20.204.0/conf/hadoop-env.shfileandmakesureJAVA_HOMEenvironmentvariableispointingtothecorrectlocationofthejavathatisinstalledonyoursystem.

$grepJAVA~/hadoop-0.20.204.0/conf/hadoop-env.sh

exportJAVA_HOME=/usr/java/jdk1.6.0_27Afterthisstep,hadoopwillbeinstalledunder/home/hadoop/hadoop-0.20.204.0directory.

4.ModifyHadoopConfigurationFiles

Addthe<configuration>sectionshownbelowtothecore-site.xmlfile.ThisindicatestheHDFSdefaultlocationandtheport.

$cat~/hadoop-0.20.204.0/conf/core-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>Addthe<configuration>sectionshownbelowtothehdfs-site.xmlfile.

$cat~/hadoop-0.20.204.0/conf/hdfs-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>dfs.replication</name>

</property>

<name>dfs.permissions</name>

<value>false</value>

</property>

</configuration>Addthe<configuration>sectionshownbelowtothemapred-site.xmlfile.Thisindicatesthatthejobtrackeruses9001astheport.

$cat~/hadoop-0.20.204.0/conf/mapred-site.xml

<?xmlversion="1.0"?>

<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>5.Setuppasswordlesssshtolocalhost

Usessh-keygentogeneratetheprivateandpublickeyvaluepair.

$ssh-keygen

Generatingpublic/privatersakeypair.

Enterfileinwhichtosavethekey(/home/hadoop/.ssh/id_rsa):

Enterpassphrase(emptyfornopassphrase):

Entersamepassphraseagain:

Youridentificationhasbeensavedin/home/hadoop/.ssh/id_rsa.

Yourpublickeyhasbeensavedin/home/hadoop/.ssh/id_rsa.pub.

Thekeyfingerprintis:

02:5a:19:ab:1e:g2:1a:11:bb:22:30:6d:12:38:a9:b1hadoop@hadoop

Thekey'srandomartimageis:

+--[RSA2048]----+

|oo|

|o+..|

|++oo|

|o.o=.|

|.+=S|

|.o.o+.|

|...o.|

|.E..|

|...|

+-----------------+Addthepublickeytotheauthorized_keys.Justusethessh-copy-idcommand,whichwilltakecareofthisstepautomaticallyandassignappropriatepermissionstothesefiles.

$ssh-copy-id-i~/.ssh/id_rsa.publocalhost

hadoop@localhost'spassword:

Nowtryloggingintothemachine,with"ssh'localhost'",andcheckin:

.ssh/authorized_keys

tomakesurewehaven'taddedextrakeysthatyouweren'texpecting.Testthepasswordlesslogintothelocalhostasshownbelow.

$sshlocalhost

Lastlogin:SatJan1423:01:592012fromlocalhostFormoredetailsonthis,read3StepstoPerformSSHLoginWithoutPasswordUsingssh-keygen&ssh-copy-id

6.FormatHadoopNameNode

Formatthenamenodeusingthehadoopcommandasshownbelow.You’llseethemessage“Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted”ifthiscommandworksproperly.

$cd~/hadoop-0.20.204.0

$bin/hadoopnamenode-format

12/01/1423:02:27INFOnamenode.NameNode:STARTUP_MSG:

/************************************************************

STARTUP_MSG:StartingNameNode

STARTUP_MSG:host=hadoop/127.0.0.1

STARTUP_MSG:args=[-format]

STARTUP_MSG:version=0.20.204.0

STARTUP_MSG:build=git://hrt8n35.cc1.ygridcore.net/onbranchbranch-0.20-security-204-r65e258bf0813ac2b15bb4c954660eaf9e8fba141;compiledby'hortonow'onThuAug2523:35:31UTC2011

************************************************************/

12/01/1423:02:27INFOutil.GSet:VMtype=64-bit

12/01/1423:02:27INFOutil.GSet:2%maxmemory=17.77875MB

12/01/1423:02:27INFOutil.GSet:capacity=2^21=2097152entries

12/01/1423:02:27INFOutil.GSet:recommended=2097152,actual=2097152

12/01/1423:02:27INFOnamenode.FSNamesystem:fsOwner=hadoop

12/01/1423:02:27INFOnamenode.FSNamesystem:supergroup=supergroup

12/01/1423:02:27INFOnamenode.FSNamesystem:isPermissionEnabled=true

12/01/1423:02:27INFOnamenode.FSNamesystem:dfs.block.invalidate.limit=100

12/01/1423:02:27INFOnamenode.FSNamesystem:isAccessTokenEnabled=falseaccessKeyUpdateInterval=0min(s),accessTokenLifetime=0min(s)

12/01/1423:02:27INFOnamenode.NameNode:Cachingfilenamesoccuringmorethan10times

12/01/1423:02:27INFOcommon.Storage:Imagefileofsize112savedin0seconds.

12/01/1423:02:27INFOcommon.Storage:Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted.

12/01/1423:02:27INFOnamenode.NameNode:SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:ShuttingdownNameNodeathadoop/127.0.0.1

************************************************************/7.StartAllHadoopRelatedServices

Usethe~/hadoop-0.20.204.0/bin/start-all.shscripttostartallhadooprelatedservices.Thiswillstartthenamenode,datanode,secondarynamenode,jobtracker,tasktracker,etc.

$bin/start-all.sh

startingnamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-namenode-hadoop.out

localhost:startingdatanode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-datanode-hadoop.out

localhost:startingsecondarynamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-secondarynamenode-hadoop.out

startingjobtracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-jobtracker-hadoop.out

localhost:startingtasktracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../

8.BrowseNameNodeandJobTrackerWebGUI

OncealltheHadoopprocessesarestarted,youcanviewthehealthandstatusoftheHDFSfromawebinterface.Usehttp://{your-hadoop-server-ip}:50070/dfshealth.jsp

Forexample,ifyou’veinstalledhadooponaserverwithip-address192.168.1.10,thenusehttp://192.168.1.10:50070/dfshealth.jsptoviewtheNameNodeGUI

Thiswilldisplaythefollowinginformation:

BasicNameNodeinformation:

■ThiswillshowwhentheNamenodewasstarted,thehadoopversionnumber,whetheranyupgradesarecurrentlyinprogressornot.

■Thisalsohaslink“Browsethefilesystem”,whichwillletbrowsethecontentofHDFSfilesystemfrombrowser

■Clickon“NamenodeLogs”toviewthelogs

ClusterSummarydisplaysthefollowinginformation:

■TotalnumberoffilesanddirectoriesmanagedbytheHDFS

■Anywarningmessage(forexample:missingblocksasshownintheimagebelow)

■TotalHDFSfilesystemsize

■BothHDFS%age-usedandsize-used

■Totalnumberofnodesinthisdistributedsystem

NameNodestorageinformation:ThisdisplaysthestoragedirectoryoftheHDFSfilesystem,thefilesystemtype,andthestate(Activeornot)

ToaccesstheJobTrackerwebinterface,usehttp://{your-hadoop-server-ip}:50090

Forexample,ifyou’veinstalledhadooponaserverwithip-address192.168.1.10,thenusehttp://192.168.102.20:50090/toviewtheJobTrackerGUI

Asshownbythenetstatcommandbelow,youcanseeboththeseportsaregettingused.

$netstat-a|grep500

tcp00*:50090*:*LISTEN

tcp00*:50070*:*LISTEN

tcp00hadoop.thegeekstuff.com:50090::ffff:192.168.1.98:55923ESTABL

9.TestSampleHadoopProgram

Thisexampleprogramisprovidedaspartofthehadoop,anditisshowninthehadoopdocumentasansimpleexampletoseewhetherthissetupwork.

Fortestingpurpose,addsomesampledatafilestotheinputdirectory.Letusjustcopyallthexmlfilefromtheconfdirectorytotheinputdirectory.So,thesexmlfilewillbeconsideredasthedatafilefortheexampleprogram.Inthestandaloneversion,youusedthestandardcpcommandtocopyittotheinputdirectory.

HoweverinadistributedHadoopsetup,you’llbeusing-putoptionofthehadoopcommandtoaddfilestotheHDFSfilesystem.KeepinmindthatyouarenotaddingthefilestoaLinuxfilesystem,youareaddingtheinputfilestotheHadoopDistributedfilesystem.So,youuseusethehadoopcommandtodothis.

$cd~/hadoop-0.20.204.0

$bin/hadoopfs-putconfinputExecutethesamplehadooptestprogram.Thisisasimplehadoopprogramthatsimulatesagrep.Thissearchesforthereg-expattern“dfs[a-z.]+”inalltheinput/*.xmlfiles(thatisstoredintheHDFS)andstorestheoutputintheoutputdirectorythatwillbestoredintheHDFS.

$bin/hadoopjarhadoop-examples-*.jargrepinputoutput'dfs[a-z.]+'Wheneverythingissetupproperly,theabovesamplehadooptestprogramwilldisplaythefollowingmessagesonthescreenwhenitisexecutingit.

$bin/hadoopjarhadoop-examples-*.jargrepinputoutput'dfs[a-z.]+'

12/01/1423:45:02INFOmapred.FileInputFormat:Totalinputpathstoprocess:18

12/01/1423:45:02INFOmapred.JobClient:Runningjob:job_201111020543_0001

12/01/1423:45:03INFOmapred.JobClient:map0%reduce0%

12/01/1423:45:18INFOmapred.JobClient:map11%reduce0%

12/01/1423:45:24INFOmapred.JobClient:map22%reduce0%

12/01/1423:45:27INFOmapred.JobClient:map22%reduce3%

12/01/1423:45:30INFOmapred.JobClient:map33%reduce3%

12/01/1423:45:36INFOmapred.JobClient:map44%reduce7%

12/01/1423:45:42INFOmapred.JobClient:map55%reduce14%

12/01/1423:45:48INFOmapred.JobClient:map66%reduce14%

12/01/1423:45:51INFOmapred.JobClient:map66%reduce18%

12/01/1423:45:54INFOmapred.JobClient:map77%reduce18%

12/01/1423:45:57INFOmapred.JobClient:map77%reduce22%

12/01/1423:46:00INFOmapred.JobClient:map88%reduce22%

12/01/1423:46:06INFOmapred.JobClient:map100%reduce25%

12/01/1423:46:15INFOmapred.JobClient:map100%reduce100%

12/01/1423:46:20INFOmapred.JobClient:Jobcomplete:job_201111020543_0001

...Theabovecommandwillcreatetheoutputdirectory(inHDFS)withtheresultsasshownbelow.Toviewthisoutputdirectory,youshoulduse“-get”optioninthehadoopcommandasshownbelow.

$bin/hadoopfs-getoutputoutput

$ls-loutput

total4

-rwxrwxrwx.1rootroot11Aug2308:39part-00000

-rwxrwxrwx.1rootroot0Aug2308:39_SUCCESS

$catoutput/*

1dfsadmin10.TroubleshootingHadoopIssues

Issue1:“Temporaryfailureinnameresolution”

Whileexecutingthesamplehadoopprogram,youmightgetthefollowingerrormessage.

12/01/1407:34:57INFOmapred.JobClient:Cleaningupthestagingareafile:/tmp/hadoop-root/mapred/staging/root-1040516815/.staging/job_local_0001

java.net.UnknownHostException:hadoop:hadoop:Temporaryfailureinnameresolution

atjava.net.InetAddress.getLocalHost(InetAddress.java:1438)

atorg.apache.hadoop.mapred.JobClient$2.run(JobClient.java:815)

atorg.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)

atjava.security.AccessController.doPrivileged(NativeMethod)Solution1:Addthefollowingentrytothe/etc/hostsfilethatcontainstheip-address,FQDNfullyqualifieddomainname,andhostname.

192.168.1.10hadoop.thegeekstuff.comhadoopIssue2:“localhost:Error:JAVA_HOMEisnotset”

Whileexecutinghadoopstart-all.sh,youmightgetthiserrorasshownbelow.

$bin/start-all.sh

startingnamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-namenode-hadoop.out

localhost:startingdatanode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-datanode-hadoop.out

localhost:Error:JAVA_HOMEisnotset.

localhost:startingsecondarynamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-secondarynamenode-hadoop.out

localhost:Error:JAVA_HOMEisnotset.

startingjobtracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-jobtracker-hadoop.out

localhost:startingtasktracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-tasktracker-hadoop.out

localhost:Error:JAVA_HOMEisnotset.Solution2:MakesureJAVA_HOMEissetupproperlyintheconf/hadoop-env.shasshownbelow.

$grepJAVA_HOMEconf/hadoop-env.sh

exportJAVA_HOME=/usr/java/jdk1.6.0_27Issue3:Errorwhileexecuting“bin/hadoopfs-putconfinput”

Youmightgetoneofthefollowingerrormessages(includingput:org.apache.hadoop.security.AccessControlException:Permissiondenied:)whileexecutingthehadoopfsputcommandasshownbelow.

$bin/hadoopfs-putconfinput

12/01/1423:21:53INFOipc.Client:Retryingconnecttoserver:localhost/127.0.0.1:9000.Alreadytried7time(s).

12/01/1423:21:54INFOipc.Client:Retryingconnecttoserver:localhost/127.0.0.1:9000.Alreadytried8time(s).

12/01/1423:21:55INFOipc.Client:Retryingconnecttoserver:localhost/127.0.0.1:9000.Alreadytried9time(s).

BadconnectiontoFS.commandaborted.exception:Calltolocalhost/127.0.0.1:9000failedonconnectionexception:java.net.ConnectException:Connectionrefused

$bin/hadoopfs-putconfinput

put:org.apache.hadoop.security.AccessControlException:Permissiondenied:user=hadoop,access=WRITE,inode="":root:supergroup:rwxr-xr-x

$ls-linputSolution3:Makesure/etc/hostsfileissetupproperly.Also,ifyourHDFSfilesystemisnotcreatedproperly,youmighthaveissuesduring“hadoopfs-put”.FormatyourHDFSusing“bin/hadoopnamenode-format”andconfirmthatthisdisplays“successfullyformatted”message.

Issue4:Whileexecutingstart-all.sh(orstart-dfs.sh),youmightgetthiserrormessage:“localhost:Unrecognizedoption:-jvmlocalhost:CouldnotcreatetheJavavirtualmachine.”

Solution4:Thismighthappensifyou’veinstalledhadoopasrootandtryingtostarttheprocess.Thisisknowbug,thatisfixedaccordingtothisbugreport.But,ifyouhitthisbug,tryinstallinghadoopasanon-rootaccount(justlikehowwe’veexplainedinthisarticle),whichshouldfixthisissue.

hadoop

安科网

How To Install Apache Hadoop Pseudo Distributed Mode on a Single Node

搁浅记忆

搁浅记忆

相关推荐

Hadoop3.2.0集群搭建常见注意事项

为什么Java仍将是未来的主流语言？

hadoop伪分布式环境搭建

_服役新节点，退役旧节点，多目录配置。+_HDFS2.x的新特性

Hadoop（一）安装

第四周练习

Hadoop小练习

hadoop框架三大组件hdfs、mapreduce、yarn 内容

Hadoop基础（三十三）：Zookeeper 分布式安装部署

Hadoop基础（二十二）：Shuffle机制（三）

hdfs、hive、hbase的搭建总结

NameNode和Zookeeper的format作用

hadoop集群的启动与停止

JStorm介绍

Hadoop2.7.7 centos7 完全分布式配置与问题随记

Hadoop Yarn工作机制 Job提交流程

【赵强老师】大数据工作流引擎Oozie

Hadoop

入门大数据---Spark开发环境搭建

hadoop创建目录

搁浅记忆