How To Install Apache Hadoop Pseudo Distributed Mode on a Single Node
http://www.thegeekstuff.com/2012/02/hadoop-pseudo-distributed-installation/
1.CreateaHadoopUser
#adduserhadoop
#passwdhadoop
2.DownloadHadoopCommon
#su-hadoop
$wgethttp://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gz
MakesureJava1.6isinstalledonyoursystem.
$java-version
javaversion"1.6.0_20"
OpenJDKRuntimeEnvironment(IcedTea61.9.7)(rhel-1.39.1.9.7.el6-x86_64)
OpenJDK64-BitServerVM(build19.0-b09,mixedmode)
3.UnpackunderhadoopUser
$tarxvfzhadoop-0.20.203.0rc1.tar.gz
Thiswillcreatethe“hadoop-0.20.204.0″directory.
$ls-lhadoop-0.20.204.0
total6780
drwxr-xr-x.2hadoophadoop4096Oct1208:50bin
-rw-rw-r--.1hadoophadoop110797Aug2516:28build.xml
drwxr-xr-x.4hadoophadoop4096Aug2516:38c++
-rw-rw-r--.1hadoophadoop419532Aug2516:28CHANGES.txt
drwxr-xr-x.2hadoophadoop4096Nov205:29conf
drwxr-xr-x.14hadoophadoop4096Aug2516:28contrib
drwxr-xr-x.7hadoophadoop4096Oct1208:49docs
drwxr-xr-x.3hadoophadoop4096Aug2516:29etc
Modifythehadoop-0.20.204.0/conf/hadoop-env.shfileandmakesureJAVA_HOMEenvironmentvariableispointingtothecorrectlocationofthejavathatisinstalledonyoursystem.
$grepJAVA~/hadoop-0.20.204.0/conf/hadoop-env.sh
exportJAVA_HOME=/usr/java/jdk1.6.0_27.
4.ModifyHadoopConfigurationFiles
Addthe<configuration>sectionshownbelowtothecore-site.xmlfile.ThisindicatestheHDFSdefaultlocationandtheport.
$cat~/hadoop-0.20.204.0/conf/core-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Addthe<configuration>sectionshownbelowtothehdfs-site.xmlfile.
$cat~/hadoop-0.20.204.0/conf/hdfs-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
Addthe<configuration>sectionshownbelowtothemapred-site.xmlfile.Thisindicatesthatthejobtrackeruses9001astheport.
$cat~/hadoop-0.20.204.0/conf/mapred-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
5.Setuppasswordlesssshtolocalhost
InatypicalHadoopproductionenvironmentyou’llbesettingupthispasswordlesssshaccessbetweenthedifferentservers.Sincewearesimulatingadistributedenvironmentonasingleserver,weneedtosetupthepasswordlesssshaccesstothelocalhostitself.
Usessh-keygentogeneratetheprivateandpublickeyvaluepair.
$ssh-keygen
Generatingpublic/privatersakeypair.
Enterfileinwhichtosavethekey(/home/hadoop/.ssh/id_rsa):
Enterpassphrase(emptyfornopassphrase):
Entersamepassphraseagain:
Youridentificationhasbeensavedin/home/hadoop/.ssh/id_rsa.
Yourpublickeyhasbeensavedin/home/hadoop/.ssh/id_rsa.pub.
Thekeyfingerprintis:
02:5a:19:ab:1e:g2:1a:11:bb:22:30:6d:12:38:a9:b1hadoop@hadoop
Thekey'srandomartimageis:
+--[RSA2048]----+
|oo|
|o+..|
|++oo|
|o.o=.|
|.+=S|
|.o.o+.|
|...o.|
|.E..|
|...|
+-----------------+
Addthepublickeytotheauthorized_keys.Justusethessh-copy-idcommand,whichwilltakecareofthisstepautomaticallyandassignappropriatepermissionstothesefiles.
$ssh-copy-id-i~/.ssh/id_rsa.publocalhost
hadoop@localhost'spassword:
Nowtryloggingintothemachine,with"ssh'localhost'",andcheckin:
.ssh/authorized_keys
tomakesurewehaven'taddedextrakeysthatyouweren'texpecting.
Testthepasswordlesslogintothelocalhostasshownbelow.
$sshlocalhost
Lastlogin:SatJan1423:01:592012fromlocalhost
6.FormatHadoopNameNode
Formatthenamenodeusingthehadoopcommandasshownbelow.You’llseethemessage“Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted”ifthiscommandworksproperly.
.HowToInstallApacheHadoopPseudoDistributedModeonaSingleNode
byRameshNatarajanonFebruary8,2012
Tweet
ApacheHadoopPseudo-distributedmodeinstallationhelpsyoutosimulateamultinodeinstallationonasinglenode.Insteadofinstallinghadoopondifferentservers,youcansimulateitonasingleserver.
Beforeyoucontinue,makesureyouunderstandthehadoopfundamentals,andhavetestedthestandalonehadoopinstallation.
Ifyou’vealreadycompletedthe1stthreestepsmentionedbelowaspartofthestandlonehadoopinstallation,jumptostep4.
1.CreateaHadoopUser
Youcandownloadandinstallhadooponroot.But,itisrecommendedtoinstallitasaseparateuser.So,logintorootandcreateausercalledhadoop.
#adduserhadoop
#passwdhadoop2.DownloadHadoopCommon
DownloadtheApacheHadoopCommonandmoveittotheserverwhereyouwanttoinstallit.
Youcanalsousewgettodownloaditdirectlytoyourserverusingwget.
#su-hadoop
$wgethttp://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gzMakesureJava1.6isinstalledonyoursystem.
$java-version
javaversion"1.6.0_20"
OpenJDKRuntimeEnvironment(IcedTea61.9.7)(rhel-1.39.1.9.7.el6-x86_64)
OpenJDK64-BitServerVM(build19.0-b09,mixedmode)3.UnpackunderhadoopUser
Ashadoopuser,unpackthispackage.
$tarxvfzhadoop-0.20.203.0rc1.tar.gzThiswillcreatethe“hadoop-0.20.204.0″directory.
$ls-lhadoop-0.20.204.0
total6780
drwxr-xr-x.2hadoophadoop4096Oct1208:50bin
-rw-rw-r--.1hadoophadoop110797Aug2516:28build.xml
drwxr-xr-x.4hadoophadoop4096Aug2516:38c++
-rw-rw-r--.1hadoophadoop419532Aug2516:28CHANGES.txt
drwxr-xr-x.2hadoophadoop4096Nov205:29conf
drwxr-xr-x.14hadoophadoop4096Aug2516:28contrib
drwxr-xr-x.7hadoophadoop4096Oct1208:49docs
drwxr-xr-x.3hadoophadoop4096Aug2516:29etcModifythehadoop-0.20.204.0/conf/hadoop-env.shfileandmakesureJAVA_HOMEenvironmentvariableispointingtothecorrectlocationofthejavathatisinstalledonyoursystem.
$grepJAVA~/hadoop-0.20.204.0/conf/hadoop-env.sh
exportJAVA_HOME=/usr/java/jdk1.6.0_27Afterthisstep,hadoopwillbeinstalledunder/home/hadoop/hadoop-0.20.204.0directory.
4.ModifyHadoopConfigurationFiles
Addthe<configuration>sectionshownbelowtothecore-site.xmlfile.ThisindicatestheHDFSdefaultlocationandtheport.
$cat~/hadoop-0.20.204.0/conf/core-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>Addthe<configuration>sectionshownbelowtothehdfs-site.xmlfile.
$cat~/hadoop-0.20.204.0/conf/hdfs-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>Addthe<configuration>sectionshownbelowtothemapred-site.xmlfile.Thisindicatesthatthejobtrackeruses9001astheport.
$cat~/hadoop-0.20.204.0/conf/mapred-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>5.Setuppasswordlesssshtolocalhost
InatypicalHadoopproductionenvironmentyou’llbesettingupthispasswordlesssshaccessbetweenthedifferentservers.Sincewearesimulatingadistributedenvironmentonasingleserver,weneedtosetupthepasswordlesssshaccesstothelocalhostitself.
Usessh-keygentogeneratetheprivateandpublickeyvaluepair.
$ssh-keygen
Generatingpublic/privatersakeypair.
Enterfileinwhichtosavethekey(/home/hadoop/.ssh/id_rsa):
Enterpassphrase(emptyfornopassphrase):
Entersamepassphraseagain:
Youridentificationhasbeensavedin/home/hadoop/.ssh/id_rsa.
Yourpublickeyhasbeensavedin/home/hadoop/.ssh/id_rsa.pub.
Thekeyfingerprintis:
02:5a:19:ab:1e:g2:1a:11:bb:22:30:6d:12:38:a9:b1hadoop@hadoop
Thekey'srandomartimageis:
+--[RSA2048]----+
|oo|
|o+..|
|++oo|
|o.o=.|
|.+=S|
|.o.o+.|
|...o.|
|.E..|
|...|
+-----------------+Addthepublickeytotheauthorized_keys.Justusethessh-copy-idcommand,whichwilltakecareofthisstepautomaticallyandassignappropriatepermissionstothesefiles.
$ssh-copy-id-i~/.ssh/id_rsa.publocalhost
hadoop@localhost'spassword:
Nowtryloggingintothemachine,with"ssh'localhost'",andcheckin:
.ssh/authorized_keys
tomakesurewehaven'taddedextrakeysthatyouweren'texpecting.Testthepasswordlesslogintothelocalhostasshownbelow.
$sshlocalhost
Lastlogin:SatJan1423:01:592012fromlocalhostFormoredetailsonthis,read3StepstoPerformSSHLoginWithoutPasswordUsingssh-keygen&ssh-copy-id
6.FormatHadoopNameNode
Formatthenamenodeusingthehadoopcommandasshownbelow.You’llseethemessage“Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted”ifthiscommandworksproperly.
$cd~/hadoop-0.20.204.0
$bin/hadoopnamenode-format
12/01/1423:02:27INFOnamenode.NameNode:STARTUP_MSG:
/************************************************************
STARTUP_MSG:StartingNameNode
STARTUP_MSG:host=hadoop/127.0.0.1
STARTUP_MSG:args=[-format]
STARTUP_MSG:version=0.20.204.0
STARTUP_MSG:build=git://hrt8n35.cc1.ygridcore.net/onbranchbranch-0.20-security-204-r65e258bf0813ac2b15bb4c954660eaf9e8fba141;compiledby'hortonow'onThuAug2523:35:31UTC2011
************************************************************/
12/01/1423:02:27INFOutil.GSet:VMtype=64-bit
12/01/1423:02:27INFOutil.GSet:2%maxmemory=17.77875MB
12/01/1423:02:27INFOutil.GSet:capacity=2^21=2097152entries
12/01/1423:02:27INFOutil.GSet:recommended=2097152,actual=2097152
12/01/1423:02:27INFOnamenode.FSNamesystem:fsOwner=hadoop
12/01/1423:02:27INFOnamenode.FSNamesystem:supergroup=supergroup
12/01/1423:02:27INFOnamenode.FSNamesystem:isPermissionEnabled=true
12/01/1423:02:27INFOnamenode.FSNamesystem:dfs.block.invalidate.limit=100
12/01/1423:02:27INFOnamenode.FSNamesystem:isAccessTokenEnabled=falseaccessKeyUpdateInterval=0min(s),accessTokenLifetime=0min(s)
12/01/1423:02:27INFOnamenode.NameNode:Cachingfilenamesoccuringmorethan10times
12/01/1423:02:27INFOcommon.Storage:Imagefileofsize112savedin0seconds.
12/01/1423:02:27INFOcommon.Storage:Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted.
12/01/1423:02:27INFOnamenode.NameNode:SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:ShuttingdownNameNodeathadoop/127.0.0.1
************************************************************/
7.StartAllHadoopRelatedServices
.HowToInstallApacheHadoopPseudoDistributedModeonaSingleNode
byRameshNatarajanonFebruary8,2012
Tweet
ApacheHadoopPseudo-distributedmodeinstallationhelpsyoutosimulateamultinodeinstallationonasinglenode.Insteadofinstallinghadoopondifferentservers,youcansimulateitonasingleserver.
Beforeyoucontinue,makesureyouunderstandthehadoopfundamentals,andhavetestedthestandalonehadoopinstallation.
Ifyou’vealreadycompletedthe1stthreestepsmentionedbelowaspartofthestandlonehadoopinstallation,jumptostep4.
1.CreateaHadoopUser
Youcandownloadandinstallhadooponroot.But,itisrecommendedtoinstallitasaseparateuser.So,logintorootandcreateausercalledhadoop.
#adduserhadoop
#passwdhadoop2.DownloadHadoopCommon
DownloadtheApacheHadoopCommonandmoveittotheserverwhereyouwanttoinstallit.
Youcanalsousewgettodownloaditdirectlytoyourserverusingwget.
#su-hadoop
$wgethttp://mirror.nyi.net/apache//hadoop/common/stable/hadoop-0.20.203.0rc1.tar.gzMakesureJava1.6isinstalledonyoursystem.
$java-version
javaversion"1.6.0_20"
OpenJDKRuntimeEnvironment(IcedTea61.9.7)(rhel-1.39.1.9.7.el6-x86_64)
OpenJDK64-BitServerVM(build19.0-b09,mixedmode)3.UnpackunderhadoopUser
Ashadoopuser,unpackthispackage.
$tarxvfzhadoop-0.20.203.0rc1.tar.gzThiswillcreatethe“hadoop-0.20.204.0″directory.
$ls-lhadoop-0.20.204.0
total6780
drwxr-xr-x.2hadoophadoop4096Oct1208:50bin
-rw-rw-r--.1hadoophadoop110797Aug2516:28build.xml
drwxr-xr-x.4hadoophadoop4096Aug2516:38c++
-rw-rw-r--.1hadoophadoop419532Aug2516:28CHANGES.txt
drwxr-xr-x.2hadoophadoop4096Nov205:29conf
drwxr-xr-x.14hadoophadoop4096Aug2516:28contrib
drwxr-xr-x.7hadoophadoop4096Oct1208:49docs
drwxr-xr-x.3hadoophadoop4096Aug2516:29etcModifythehadoop-0.20.204.0/conf/hadoop-env.shfileandmakesureJAVA_HOMEenvironmentvariableispointingtothecorrectlocationofthejavathatisinstalledonyoursystem.
$grepJAVA~/hadoop-0.20.204.0/conf/hadoop-env.sh
exportJAVA_HOME=/usr/java/jdk1.6.0_27Afterthisstep,hadoopwillbeinstalledunder/home/hadoop/hadoop-0.20.204.0directory.
4.ModifyHadoopConfigurationFiles
Addthe<configuration>sectionshownbelowtothecore-site.xmlfile.ThisindicatestheHDFSdefaultlocationandtheport.
$cat~/hadoop-0.20.204.0/conf/core-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>Addthe<configuration>sectionshownbelowtothehdfs-site.xmlfile.
$cat~/hadoop-0.20.204.0/conf/hdfs-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>Addthe<configuration>sectionshownbelowtothemapred-site.xmlfile.Thisindicatesthatthejobtrackeruses9001astheport.
$cat~/hadoop-0.20.204.0/conf/mapred-site.xml
<?xmlversion="1.0"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<!--Putsite-specificpropertyoverridesinthisfile.-->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>5.Setuppasswordlesssshtolocalhost
InatypicalHadoopproductionenvironmentyou’llbesettingupthispasswordlesssshaccessbetweenthedifferentservers.Sincewearesimulatingadistributedenvironmentonasingleserver,weneedtosetupthepasswordlesssshaccesstothelocalhostitself.
Usessh-keygentogeneratetheprivateandpublickeyvaluepair.
$ssh-keygen
Generatingpublic/privatersakeypair.
Enterfileinwhichtosavethekey(/home/hadoop/.ssh/id_rsa):
Enterpassphrase(emptyfornopassphrase):
Entersamepassphraseagain:
Youridentificationhasbeensavedin/home/hadoop/.ssh/id_rsa.
Yourpublickeyhasbeensavedin/home/hadoop/.ssh/id_rsa.pub.
Thekeyfingerprintis:
02:5a:19:ab:1e:g2:1a:11:bb:22:30:6d:12:38:a9:b1hadoop@hadoop
Thekey'srandomartimageis:
+--[RSA2048]----+
|oo|
|o+..|
|++oo|
|o.o=.|
|.+=S|
|.o.o+.|
|...o.|
|.E..|
|...|
+-----------------+Addthepublickeytotheauthorized_keys.Justusethessh-copy-idcommand,whichwilltakecareofthisstepautomaticallyandassignappropriatepermissionstothesefiles.
$ssh-copy-id-i~/.ssh/id_rsa.publocalhost
hadoop@localhost'spassword:
Nowtryloggingintothemachine,with"ssh'localhost'",andcheckin:
.ssh/authorized_keys
tomakesurewehaven'taddedextrakeysthatyouweren'texpecting.Testthepasswordlesslogintothelocalhostasshownbelow.
$sshlocalhost
Lastlogin:SatJan1423:01:592012fromlocalhostFormoredetailsonthis,read3StepstoPerformSSHLoginWithoutPasswordUsingssh-keygen&ssh-copy-id
6.FormatHadoopNameNode
Formatthenamenodeusingthehadoopcommandasshownbelow.You’llseethemessage“Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted”ifthiscommandworksproperly.
$cd~/hadoop-0.20.204.0
$bin/hadoopnamenode-format
12/01/1423:02:27INFOnamenode.NameNode:STARTUP_MSG:
/************************************************************
STARTUP_MSG:StartingNameNode
STARTUP_MSG:host=hadoop/127.0.0.1
STARTUP_MSG:args=[-format]
STARTUP_MSG:version=0.20.204.0
STARTUP_MSG:build=git://hrt8n35.cc1.ygridcore.net/onbranchbranch-0.20-security-204-r65e258bf0813ac2b15bb4c954660eaf9e8fba141;compiledby'hortonow'onThuAug2523:35:31UTC2011
************************************************************/
12/01/1423:02:27INFOutil.GSet:VMtype=64-bit
12/01/1423:02:27INFOutil.GSet:2%maxmemory=17.77875MB
12/01/1423:02:27INFOutil.GSet:capacity=2^21=2097152entries
12/01/1423:02:27INFOutil.GSet:recommended=2097152,actual=2097152
12/01/1423:02:27INFOnamenode.FSNamesystem:fsOwner=hadoop
12/01/1423:02:27INFOnamenode.FSNamesystem:supergroup=supergroup
12/01/1423:02:27INFOnamenode.FSNamesystem:isPermissionEnabled=true
12/01/1423:02:27INFOnamenode.FSNamesystem:dfs.block.invalidate.limit=100
12/01/1423:02:27INFOnamenode.FSNamesystem:isAccessTokenEnabled=falseaccessKeyUpdateInterval=0min(s),accessTokenLifetime=0min(s)
12/01/1423:02:27INFOnamenode.NameNode:Cachingfilenamesoccuringmorethan10times
12/01/1423:02:27INFOcommon.Storage:Imagefileofsize112savedin0seconds.
12/01/1423:02:27INFOcommon.Storage:Storagedirectory/tmp/hadoop-hadoop/dfs/namehasbeensuccessfullyformatted.
12/01/1423:02:27INFOnamenode.NameNode:SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:ShuttingdownNameNodeathadoop/127.0.0.1
************************************************************/7.StartAllHadoopRelatedServices
Usethe~/hadoop-0.20.204.0/bin/start-all.shscripttostartallhadooprelatedservices.Thiswillstartthenamenode,datanode,secondarynamenode,jobtracker,tasktracker,etc.
$bin/start-all.sh
startingnamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-namenode-hadoop.out
localhost:startingdatanode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-datanode-hadoop.out
localhost:startingsecondarynamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-secondarynamenode-hadoop.out
startingjobtracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-jobtracker-hadoop.out
localhost:startingtasktracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../
8.BrowseNameNodeandJobTrackerWebGUI
OncealltheHadoopprocessesarestarted,youcanviewthehealthandstatusoftheHDFSfromawebinterface.Usehttp://{your-hadoop-server-ip}:50070/dfshealth.jsp
Forexample,ifyou’veinstalledhadooponaserverwithip-address192.168.1.10,thenusehttp://192.168.1.10:50070/dfshealth.jsptoviewtheNameNodeGUI
Thiswilldisplaythefollowinginformation:
BasicNameNodeinformation:
■ThiswillshowwhentheNamenodewasstarted,thehadoopversionnumber,whetheranyupgradesarecurrentlyinprogressornot.
■Thisalsohaslink“Browsethefilesystem”,whichwillletbrowsethecontentofHDFSfilesystemfrombrowser
■Clickon“NamenodeLogs”toviewthelogs
ClusterSummarydisplaysthefollowinginformation:
■TotalnumberoffilesanddirectoriesmanagedbytheHDFS
■Anywarningmessage(forexample:missingblocksasshownintheimagebelow)
■TotalHDFSfilesystemsize
■BothHDFS%age-usedandsize-used
■Totalnumberofnodesinthisdistributedsystem
NameNodestorageinformation:ThisdisplaysthestoragedirectoryoftheHDFSfilesystem,thefilesystemtype,andthestate(Activeornot)
ToaccesstheJobTrackerwebinterface,usehttp://{your-hadoop-server-ip}:50090
Forexample,ifyou’veinstalledhadooponaserverwithip-address192.168.1.10,thenusehttp://192.168.102.20:50090/toviewtheJobTrackerGUI
Asshownbythenetstatcommandbelow,youcanseeboththeseportsaregettingused.
$netstat-a|grep500
tcp00*:50090*:*LISTEN
tcp00*:50070*:*LISTEN
tcp00hadoop.thegeekstuff.com:50090::ffff:192.168.1.98:55923ESTABL
9.TestSampleHadoopProgram
Thisexampleprogramisprovidedaspartofthehadoop,anditisshowninthehadoopdocumentasansimpleexampletoseewhetherthissetupwork.
Fortestingpurpose,addsomesampledatafilestotheinputdirectory.Letusjustcopyallthexmlfilefromtheconfdirectorytotheinputdirectory.So,thesexmlfilewillbeconsideredasthedatafilefortheexampleprogram.Inthestandaloneversion,youusedthestandardcpcommandtocopyittotheinputdirectory.
HoweverinadistributedHadoopsetup,you’llbeusing-putoptionofthehadoopcommandtoaddfilestotheHDFSfilesystem.KeepinmindthatyouarenotaddingthefilestoaLinuxfilesystem,youareaddingtheinputfilestotheHadoopDistributedfilesystem.So,youuseusethehadoopcommandtodothis.
$cd~/hadoop-0.20.204.0
$bin/hadoopfs-putconfinputExecutethesamplehadooptestprogram.Thisisasimplehadoopprogramthatsimulatesagrep.Thissearchesforthereg-expattern“dfs[a-z.]+”inalltheinput/*.xmlfiles(thatisstoredintheHDFS)andstorestheoutputintheoutputdirectorythatwillbestoredintheHDFS.
$bin/hadoopjarhadoop-examples-*.jargrepinputoutput'dfs[a-z.]+'Wheneverythingissetupproperly,theabovesamplehadooptestprogramwilldisplaythefollowingmessagesonthescreenwhenitisexecutingit.
$bin/hadoopjarhadoop-examples-*.jargrepinputoutput'dfs[a-z.]+'
12/01/1423:45:02INFOmapred.FileInputFormat:Totalinputpathstoprocess:18
12/01/1423:45:02INFOmapred.JobClient:Runningjob:job_201111020543_0001
12/01/1423:45:03INFOmapred.JobClient:map0%reduce0%
12/01/1423:45:18INFOmapred.JobClient:map11%reduce0%
12/01/1423:45:24INFOmapred.JobClient:map22%reduce0%
12/01/1423:45:27INFOmapred.JobClient:map22%reduce3%
12/01/1423:45:30INFOmapred.JobClient:map33%reduce3%
12/01/1423:45:36INFOmapred.JobClient:map44%reduce7%
12/01/1423:45:42INFOmapred.JobClient:map55%reduce14%
12/01/1423:45:48INFOmapred.JobClient:map66%reduce14%
12/01/1423:45:51INFOmapred.JobClient:map66%reduce18%
12/01/1423:45:54INFOmapred.JobClient:map77%reduce18%
12/01/1423:45:57INFOmapred.JobClient:map77%reduce22%
12/01/1423:46:00INFOmapred.JobClient:map88%reduce22%
12/01/1423:46:06INFOmapred.JobClient:map100%reduce25%
12/01/1423:46:15INFOmapred.JobClient:map100%reduce100%
12/01/1423:46:20INFOmapred.JobClient:Jobcomplete:job_201111020543_0001
...Theabovecommandwillcreatetheoutputdirectory(inHDFS)withtheresultsasshownbelow.Toviewthisoutputdirectory,youshoulduse“-get”optioninthehadoopcommandasshownbelow.
$bin/hadoopfs-getoutputoutput
$ls-loutput
total4
-rwxrwxrwx.1rootroot11Aug2308:39part-00000
-rwxrwxrwx.1rootroot0Aug2308:39_SUCCESS
$catoutput/*
1dfsadmin10.TroubleshootingHadoopIssues
Issue1:“Temporaryfailureinnameresolution”
Whileexecutingthesamplehadoopprogram,youmightgetthefollowingerrormessage.
12/01/1407:34:57INFOmapred.JobClient:Cleaningupthestagingareafile:/tmp/hadoop-root/mapred/staging/root-1040516815/.staging/job_local_0001
java.net.UnknownHostException:hadoop:hadoop:Temporaryfailureinnameresolution
atjava.net.InetAddress.getLocalHost(InetAddress.java:1438)
atorg.apache.hadoop.mapred.JobClient$2.run(JobClient.java:815)
atorg.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
atjava.security.AccessController.doPrivileged(NativeMethod)Solution1:Addthefollowingentrytothe/etc/hostsfilethatcontainstheip-address,FQDNfullyqualifieddomainname,andhostname.
192.168.1.10hadoop.thegeekstuff.comhadoopIssue2:“localhost:Error:JAVA_HOMEisnotset”
Whileexecutinghadoopstart-all.sh,youmightgetthiserrorasshownbelow.
$bin/start-all.sh
startingnamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-namenode-hadoop.out
localhost:startingdatanode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-datanode-hadoop.out
localhost:Error:JAVA_HOMEisnotset.
localhost:startingsecondarynamenode,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-secondarynamenode-hadoop.out
localhost:Error:JAVA_HOMEisnotset.
startingjobtracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-jobtracker-hadoop.out
localhost:startingtasktracker,loggingto/home/hadoop/hadoop-0.20.204.0/libexec/../logs/hadoop-hadoop-tasktracker-hadoop.out
localhost:Error:JAVA_HOMEisnotset.Solution2:MakesureJAVA_HOMEissetupproperlyintheconf/hadoop-env.shasshownbelow.
$grepJAVA_HOMEconf/hadoop-env.sh
exportJAVA_HOME=/usr/java/jdk1.6.0_27Issue3:Errorwhileexecuting“bin/hadoopfs-putconfinput”
Youmightgetoneofthefollowingerrormessages(includingput:org.apache.hadoop.security.AccessControlException:Permissiondenied:)whileexecutingthehadoopfsputcommandasshownbelow.
$bin/hadoopfs-putconfinput
12/01/1423:21:53INFOipc.Client:Retryingconnecttoserver:localhost/127.0.0.1:9000.Alreadytried7time(s).
12/01/1423:21:54INFOipc.Client:Retryingconnecttoserver:localhost/127.0.0.1:9000.Alreadytried8time(s).
12/01/1423:21:55INFOipc.Client:Retryingconnecttoserver:localhost/127.0.0.1:9000.Alreadytried9time(s).
BadconnectiontoFS.commandaborted.exception:Calltolocalhost/127.0.0.1:9000failedonconnectionexception:java.net.ConnectException:Connectionrefused
$bin/hadoopfs-putconfinput
put:org.apache.hadoop.security.AccessControlException:Permissiondenied:user=hadoop,access=WRITE,inode="":root:supergroup:rwxr-xr-x
$ls-linputSolution3:Makesure/etc/hostsfileissetupproperly.Also,ifyourHDFSfilesystemisnotcreatedproperly,youmighthaveissuesduring“hadoopfs-put”.FormatyourHDFSusing“bin/hadoopnamenode-format”andconfirmthatthisdisplays“successfullyformatted”message.
Issue4:Whileexecutingstart-all.sh(orstart-dfs.sh),youmightgetthiserrormessage:“localhost:Unrecognizedoption:-jvmlocalhost:CouldnotcreatetheJavavirtualmachine.”
Solution4:Thismighthappensifyou’veinstalledhadoopasrootandtryingtostarttheprocess.Thisisknowbug,thatisfixedaccordingtothisbugreport.But,ifyouhitthisbug,tryinstallinghadoopasanon-rootaccount(justlikehowwe’veexplainedinthisarticle),whichshouldfixthisissue.