hadoop(1)Quick Guide to Hadoop on Ubuntu

hadoop(1)QuickGuidetoHadooponUbuntu

TheApacheHadoopsoftwarelibraryscaleupfromsingleserverstothousandsofmachines,eachofferinglocalcomputationandstorage.

Subprojects:

HadoopCommon

HadoopDistributedFileSystem(HDFS):Adistributefilesystemthatprovideshigh-throughputaccesstoapplicationdata.

HadoopMapReduce:Asoftwareframeworkfordistributedprocessingoflargedatasetsoncomputeclusters.

OtherHadoop-relatedprojects:

Avro:Adataserializationsystem

Cassandra:Ascalablemulti-masterdatabasewithnosinglepointsoffailure.

chukwa:Adatacollectionsystemformanaginglargedistributedsystems.

HBase:Ascalable,distributeddatabasethatsupportsstructureddatastorageforlargetable.

Hive:Adatawarehouseinfrastructurethatprovidesdatasummarizationandadhocquerying.

Mahout:Ascalablemachinelearninganddatamininglibrary.

Pig:Ahigh-leveldata-flowlanguageandexecutionframeworkforparallelcomputation.

ZooKeeper:Ahigh-performancecoordinationservicefordistributedapplications.

1.SingleNodeSetup

QuicklyperformsimpleoperationsusingHadoopMapReduceandtheHadoopDistributedFileSystem(HDFS).

Weneedtoinstallhttp://www.cygwin.com/inwin7first.Win7isonlyfordeveloping.ButIusemyvirtualmachineubuntu.

DownloadtheHadoopreleasefromhttp://mirror.cc.columbia.edu/pub/software/apache//hadoop/common/stable/.Thefilenamesare

hadoop-0.23.0-src.tar.gzandhadoop-0.23.0.tar.gz.

Idecidedtobuilditfromthesourcefile.

InstallProtocolBufferonubuntu.

downloadfilefromthisURLhttp://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz

>wgethttp://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz

>tarzxvfprotobuf-2.4.1.tar.gz

>cdprotobuf-2.4.1

>sudo./configure--prefix=/usr/local

>sudomake

>sudomakeinstall

InstallHadoopCommon

>svncheckouthttp://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.0/

>cdrelease-0.23.0

>mvnpackage-Pdist,native,docs,src-DskipTests-Dtar

errormessage:

org.apache.maven.reactor.MavenExecutionException:FailedtovalidatePOMforprojectorg.apache.hadoop:hadoop-projectat/home/carl/download/release-0.23.0/hadoop-project/pom.xml

atorg.apache.maven.DefaultMaven.getProjects(DefaultMaven.java:404)

atorg.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:272)

atorg.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)

Solution:

trytoinstallmaven3instead

>sudoapt-getremovemaven2

>sudoapt-getautoremovemaven2

>sudoapt-getinstallpython-software-properties

>sudoadd-apt-repository"debhttp://build.discursive.com/apt/lucidmain"

>sudoapt-getupdate

>sudoapt-getinstallmaven

addthisto/etc/environmentPATHcolumn.

/usr/local/apache-maven-3.0.3/bin

>./etc/environment

Itisworknow.

>mvnpackage-Pdist,native,docs,src-DskipTests-Dtar

Fail,andIchecktheBUILDING.txtfileandgetthis:

*UnixSystem

*JDK1.6

*Maven3.0

*Forrest0.8(ifgeneratingdocs)

*Findbugs1.3.9(ifrunningfindbugs)

*ProtocolBuffer2.4.1+(forMapReduce)

*Autotools(ifcompilingnativecode)

*Internetconnectionforfirstbuild(tofetchallMavenandHadoopdependencies)

InstallForrestonUbuntu

http://forrest.apache.org/

>wgethttp://mirrors.200p-sf.sonic.net/apache//forrest/apache-forrest-0.9.tar.gz

>tarzxvfapache-forrest-0.9.tar.gz

>sudomvapache-forrest-0.9/usr/local/

>sudovi/etc/environment

addpath/usr/local/apache-forrest-0.9/bininPATH

>./etc/environment

InstallAutotoolsinUbuntu

>sudoapt-getinstallbuild-essentialg++automakeautoconfgnu-standardsautoconf-doclibtoolgettextautoconf-archive

buildthehadoopagain

>mvnpackage-Pdist-DskipTests=true-Dtar

Buildsuccess.Icangetthefile/home/carl/download/release-0.23.0/hadoop-dist/target/hadoop-0.23.0-SNAPSHOT.tar.gz

Makesuresshandrsyncareonmysystem.

>sudoapt-getinstallssh

>sudoapt-getinstallrsync

Unpackthehadoopdistribution.

>tarzxvfhadoop-0.23.0-SNAPSHOT.tar.gz

>sudomvhadoop-0.23.0-SNAPSHOT/usr/local/

>cd/usr/local/

>sudomvhadoop-0.23.0-SNAPSHOThadoop-0.23.0

>cdhadoop-0.23/conf/

>vihadoop-env.sh

modifythelineofJAVA_HOMEtofollowingstatement

JAVA_HOME=/usr/lib/jvm/java-6-sun

checkthehadoopcommand

>bin/hadoopversion

Hadoop0.23.0-SNAPSHOT

Subversionhttp://svn.apache.org/repos/asf/hadoop/common/tags/release-0.23.0/hadoop-common-project/hadoop-common-r1196973

CompiledbycarlonWedNov3002:32:31EST2011

Fromsourcewithchecksum4e42b2d96c899a98a8ab8c7cc23f27ae

Thereare3modes:

Local(Standalone)Mode

Pseudo-DistributedMode

Fully-DistributedMode

StandaloneOperation

>mkdirinput

>cpconf/*.xmlinput

>viinput/1.xml

YARNtestforfun

>bin/hadoopjarhadoop-mapreduce-examples-0.23.0.jargrepinputoutput'YARN[a-zA-Z.]+'

>catoutput/*

1YARNtestforfun

Pseudo-DistributedOperation

Hadoopcanalsoberunonasingle-nodeinapseudo-distributedmodewhereeachHadoopdaemonrunsinaseparateJavaprocess.

Configuration

conf/core-site.xml:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

conf/hdfs-site.xml:

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

conf/mapred-site.xml:

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

Setuppassphraselessssh

>ssh-keygen-tdsa-P''-f~/.ssh/id_dsa

>cat~/.ssh/id_dsa.pub>>~/.ssh/authorized_keys

>sshlocalhost

ThenIcansshconnecttothelocalhostwithoutpassword.

Execution

Formatanewdistributed-filesystem:

>bin/hadoopnamenode-format

Startthehadoopdaemons:

>bin/start-all.sh

Thelogsgothere${HADOOP_HOME}/logs./usr/local/hadoop-0.23.0/logs/yarn-carl-nodemanager-ubuntus.out.Andtheerrormessagesareasfollowing:

NoHADOOP_CONF_DIRset.

Pleasespecifyiteitherinyarn-env.shorintheenvironment.

solution:

>sudoviyarn-env.sh

>sudovi/etc/environment

>sudovihadoop-env.sh

addoneline:

HADOOP_CONF_DIR=/usr/local/hadoop-0.23.0/conf

HADOOP_COMMON_HOME=/usr/local/hadoop-0.23.0/share/hadoop/common

HADOOP_HDFS_HOME=/usr/local/hadoop-0.23.0/share/hadoop/hdfs

>bin/start-all.sh

http://192.168.56.101:9999/node

http://192.168.56.101:8088/cluster

changetheconfigurationfiles,commentalltheotherxmlfilesinconfdirectory.

>viconf/yarn-site.xml

<?xmlversion="1.0"?>

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

Findsomethingdifferentwiththelatestversion0.23.0.SoIneedtodosomechangesaccordingtoanotherguide.

references:

http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html

http://hadoop.apache.org/

http://hadoop.apache.org/common/

http://hadoop.apache.org/common/docs/stable/single_node_setup.html

http://www.blogjava.net/shenh062326/archive/2011/11/10/yuling_hadoop_0-23_compile.html

http://www.cloudera.com/blog/2009/08/hadoop-default-ports-quick-reference/

http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html

相关推荐