hadoop(1)Quick Guide to Hadoop on Ubuntu
hadoop(1)QuickGuidetoHadooponUbuntu
TheApacheHadoopsoftwarelibraryscaleupfromsingleserverstothousandsofmachines,eachofferinglocalcomputationandstorage.
Subprojects:
HadoopCommon
HadoopDistributedFileSystem(HDFS):Adistributefilesystemthatprovideshigh-throughputaccesstoapplicationdata.
HadoopMapReduce:Asoftwareframeworkfordistributedprocessingoflargedatasetsoncomputeclusters.
OtherHadoop-relatedprojects:
Avro:Adataserializationsystem
Cassandra:Ascalablemulti-masterdatabasewithnosinglepointsoffailure.
chukwa:Adatacollectionsystemformanaginglargedistributedsystems.
HBase:Ascalable,distributeddatabasethatsupportsstructureddatastorageforlargetable.
Hive:Adatawarehouseinfrastructurethatprovidesdatasummarizationandadhocquerying.
Mahout:Ascalablemachinelearninganddatamininglibrary.
Pig:Ahigh-leveldata-flowlanguageandexecutionframeworkforparallelcomputation.
ZooKeeper:Ahigh-performancecoordinationservicefordistributedapplications.
1.SingleNodeSetup
QuicklyperformsimpleoperationsusingHadoopMapReduceandtheHadoopDistributedFileSystem(HDFS).
Weneedtoinstallhttp://www.cygwin.com/inwin7first.Win7isonlyfordeveloping.ButIusemyvirtualmachineubuntu.
DownloadtheHadoopreleasefromhttp://mirror.cc.columbia.edu/pub/software/apache//hadoop/common/stable/.Thefilenamesare
hadoop-0.23.0-src.tar.gzandhadoop-0.23.0.tar.gz.
Idecidedtobuilditfromthesourcefile.
InstallProtocolBufferonubuntu.
downloadfilefromthisURLhttp://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
>wgethttp://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
>tarzxvfprotobuf-2.4.1.tar.gz
>cdprotobuf-2.4.1
>sudo./configure--prefix=/usr/local
>sudomake
>sudomakeinstall
InstallHadoopCommon
>svncheckouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.0/
>cdrelease-0.23.0
>mvnpackage-Pdist,native,docs,src-DskipTests-Dtar
errormessage:
org.apache.maven.reactor.MavenExecutionException:FailedtovalidatePOMforprojectorg.apache.hadoop:hadoop-projectat/home/carl/download/release-0.23.0/hadoop-project/pom.xml
atorg.apache.maven.DefaultMaven.getProjects(DefaultMaven.java:404)
atorg.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:272)
atorg.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
Solution:
trytoinstallmaven3instead
>sudoapt-getremovemaven2
>sudoapt-getautoremovemaven2
>sudoapt-getinstallpython-software-properties
>sudoadd-apt-repository"debhttp://build.discursive.com/apt/lucidmain"
>sudoapt-getupdate
>sudoapt-getinstallmaven
addthisto/etc/environmentPATHcolumn.
/usr/local/apache-maven-3.0.3/bin
>./etc/environment
Itisworknow.
>mvnpackage-Pdist,native,docs,src-DskipTests-Dtar
Fail,andIchecktheBUILDING.txtfileandgetthis:
*UnixSystem
*JDK1.6
*Maven3.0
*Forrest0.8(ifgeneratingdocs)
*Findbugs1.3.9(ifrunningfindbugs)
*ProtocolBuffer2.4.1+(forMapReduce)
*Autotools(ifcompilingnativecode)
*Internetconnectionforfirstbuild(tofetchallMavenandHadoopdependencies)
InstallForrestonUbuntu
http://forrest.apache.org/
>wgethttp://mirrors.200p-sf.sonic.net/apache//forrest/apache-forrest-0.9.tar.gz
>tarzxvfapache-forrest-0.9.tar.gz
>sudomvapache-forrest-0.9/usr/local/
>sudovi/etc/environment
addpath/usr/local/apache-forrest-0.9/bininPATH
>./etc/environment
InstallAutotoolsinUbuntu
>sudoapt-getinstallbuild-essentialg++automakeautoconfgnu-standardsautoconf-doclibtoolgettextautoconf-archive
buildthehadoopagain
>mvnpackage-Pdist-DskipTests=true-Dtar
Buildsuccess.Icangetthefile/home/carl/download/release-0.23.0/hadoop-dist/target/hadoop-0.23.0-SNAPSHOT.tar.gz
Makesuresshandrsyncareonmysystem.
>sudoapt-getinstallssh
>sudoapt-getinstallrsync
Unpackthehadoopdistribution.
>tarzxvfhadoop-0.23.0-SNAPSHOT.tar.gz
>sudomvhadoop-0.23.0-SNAPSHOT/usr/local/
>cd/usr/local/
>sudomvhadoop-0.23.0-SNAPSHOThadoop-0.23.0
>cdhadoop-0.23/conf/
>vihadoop-env.sh
modifythelineofJAVA_HOMEtofollowingstatement
JAVA_HOME=/usr/lib/jvm/java-6-sun
checkthehadoopcommand
>bin/hadoopversion
Hadoop0.23.0-SNAPSHOT
Subversionhttp://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.0/hadoop-common-project/hadoop-common-r1196973
CompiledbycarlonWedNov3002:32:31EST2011
Fromsourcewithchecksum4e42b2d96c899a98a8ab8c7cc23f27ae
Thereare3modes:
Local(Standalone)Mode
Pseudo-DistributedMode
Fully-DistributedMode
StandaloneOperation
>mkdirinput
>cpconf/*.xmlinput
>viinput/1.xml
YARNtestforfun
>bin/hadoopjarhadoop-mapreduce-examples-0.23.0.jargrepinputoutput'YARN[a-zA-Z.]+'
>catoutput/*
1YARNtestforfun
Pseudo-DistributedOperation
Hadoopcanalsoberunonasingle-nodeinapseudo-distributedmodewhereeachHadoopdaemonrunsinaseparateJavaprocess.
Configuration
conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Setuppassphraselessssh
>ssh-keygen-tdsa-P''-f~/.ssh/id_dsa
>cat~/.ssh/id_dsa.pub>>~/.ssh/authorized_keys
>sshlocalhost
ThenIcansshconnecttothelocalhostwithoutpassword.
Execution
Formatanewdistributed-filesystem:
>bin/hadoopnamenode-format
Startthehadoopdaemons:
>bin/start-all.sh
Thelogsgothere${HADOOP_HOME}/logs./usr/local/hadoop-0.23.0/logs/yarn-carl-nodemanager-ubuntus.out.Andtheerrormessagesareasfollowing:
NoHADOOP_CONF_DIRset.
Pleasespecifyiteitherinyarn-env.shorintheenvironment.
solution:
>sudoviyarn-env.sh
>sudovi/etc/environment
>sudovihadoop-env.sh
addoneline:
HADOOP_CONF_DIR=/usr/local/hadoop-0.23.0/conf
HADOOP_COMMON_HOME=/usr/local/hadoop-0.23.0/share/hadoop/common
HADOOP_HDFS_HOME=/usr/local/hadoop-0.23.0/share/hadoop/hdfs
>bin/start-all.sh
http://192.168.56.101:9999/node
http://192.168.56.101:8088/cluster
changetheconfigurationfiles,commentalltheotherxmlfilesinconfdirectory.
>viconf/yarn-site.xml
<?xmlversion="1.0"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Findsomethingdifferentwiththelatestversion0.23.0.SoIneedtodosomechangesaccordingtoanotherguide.
references:
http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html
http://hadoop.apache.org/
http://hadoop.apache.org/common/
http://hadoop.apache.org/common/docs/stable/single_node_setup.html
http://www.blogjava.net/shenh062326/archive/2011/11/10/yuling_hadoop_0-23_compile.html
http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html