Prediction(4)Logistic Regression - Local Cluster Set Up

Prediction(4)LogisticRegression-LocalClusterSetUp

1.TrytoSetUpHadoop

Downloadtherightversion

>wgethttp://apache.spinellicreations.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

Placeitintherightplaceandsoftlinkthefile

>hadoopversion

Hadoop2.7.1

Subversionhttps://git-wip-us.apache.org/repos/asf/hadoop.git-r15ecc87ccf4a0228f35af08fc56de536e6ce657a

Compiledbyjenkinson2015-06-29T06:04Z

Compiledwithprotoc2.5.0

Fromsourcewithchecksumfc0a1a23fc1868e4d5ee7fa2b28a58a

SetuptheCluster

>mkdir/opt/hadoop/temp

Configcore-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://ubuntu-master:9000</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/opt/hadoop/temp</value>

</property>

<property>

<name>hadoop.proxyuser.hadoop.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.hadoop.groups</name>

<value>*</value>

</property>

</configuration>

>mkdir/opt/hadoop/dfs

>mkdir/opt/hadoop/dfs/name

>mkdir/opt/hadoop/dfs/data

Configurehdfs-site.xml

<configuration>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>ubuntu-master:9001</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/opt/hadoop/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/opt/hadoop/dfs/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

>mvmapred-site.xml.templatemapred-site.xml

Configuremapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>ubuntu-master:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>ubuntu-master:19888</value>

</property>

</configuration>

Configuretheyarn-site.xml

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>ubuntu-master:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>ubuntu-master:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>ubuntu-master:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>ubuntu-master:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>ubuntu-master:8088</value>

</property>

</configuration>

Configureslaves

ubuntu-dev1

ubuntu-dev2

ubuntu-dev3

Preparethe3slavemachinesifneeded.

>mkdir~/.ssh

>vi~/.ssh/authorized_keys

Copythekeysthere,thecontentisfromcat~/.ssh/id_rsa.pub

scpallthefilestoallslavesmachines.

Thesamecommandwillstarthadoop

7.Hadoophdfsandyarn

cd/opt/hadoop

sbin/start-dfs.sh

sbin/start-yarn.sh

visitthepage

http://ubuntu-master:50070/dfshealth.html#tab-overview

http://ubuntu-master:8088/cluster

ErrorMessage:

>sbin/start-dfs.sh

Startingnamenodeson[ubuntu-master]

ubuntu-master:Error:JAVA_HOMEisnotsetandcouldnotbefound.

ubuntu-dev1:Error:JAVA_HOMEisnotsetandcouldnotbefound.

ubuntu-dev2:Error:JAVA_HOMEisnotsetandcouldnotbefound.

Solution:

>vihadoop-env.sh

exportJAVA_HOME="/usr/lib/jvm/java-8-oracle"

ErrorMessage:

2015-09-3019:39:49,482INFOorg.apache.hadoop.hdfs.server.common.Storage:Lockon/opt/hadoop/dfs/name/in_use.lockacquiredbynodename3017@ubuntu-master

2015-09-3019:39:49,487WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:Encounteredexceptionloadingfsimage

java.io.IOException:NameNodeisnotformatted.

atorg.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:225)

atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)

atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)

atorg.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)

Solution:

hdfsnamenode-format

Cool,allthingsareupandrunningforyarncluster.

2.TrytoSetUpSpark1.5.0

FetchthelatestSpark

>wgethttp://apache.mirrors.ionfish.org/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz

Unzipandplacethatintherightworkingdirectory.

3.TrytoSetUpZeppelin

Fetchthesourcecodesfirst.

>gitclonehttps://github.com/apache/incubator-zeppelin.git

>npminstall-ggrunt-cli

>grunt--version

grunt-cliv0.1.13

>mvncleanpackage-Pspark-1.5-Dspark.version=1.5.0-Dhadoop.version=2.7.0-Phadoop-2.6-Pyarn-DskipTests

Exception:

[ERROR]Failedtoexecutegoalcom.github.eirslett:frontend-maven-plugin:0.0.23:grunt(gruntbuild)onprojectzeppelin-web:Failedtoruntask:'grunt--no-color'failed.(errorcode3)->[Help1]

INFO[launcher]:TryingtostartPhantomJSagain(1/2).

ERROR[launcher]:CannotstartPhantomJS

INFO[launcher]:TryingtostartPhantomJSagain(2/2).

ERROR[launcher]:CannotstartPhantomJS

ERROR[launcher]:PhantomJSfailed2times(cannotstart).Givingup.

Warning:Task"karma:unit"failed.Use--forcetocontinue.

Solution:

>cd/home/carl/install/incubator-zeppelin/zeppelin-web

>mvncleaninstall

Igetmoreexceptionsindetail.ItshowsthatthePhantomJSisnotinstalled.

InstallPhantomJS

BuildownPhantomJSfromsource

http://phantomjs.org/build.html

Orfindanolderversionfromhere

https://code.google.com/p/phantomjs/downloads/list

Downloadtherightversion

>wgethttps://phantomjs.googlecode.com/files/phantomjs-1.9.2-linux-x86_64.tar.bz2

>bzip2-dphantomjs-1.9.2-linux-x86_64.tar.bz2

>tar-xvfphantomjs-1.9.2-linux-x86_64.tar

Movetotheproperdirectory.Addtopath.Verifyinstallation.

ErrorException:

phantomjs--version

phantomjs:errorwhileloadingsharedlibraries:libfontconfig.so.1:cannotopensharedobjectfile:Nosuchfileordirectory

Solution:

>sudoapt-getinstalllibfontconfig

Itworks.

>phantomjs--version

1.9.2

BuildSuccess.

4.ConfigureSparkandZeppelin

SetUpZeppelin

>cpzeppelin-env.sh.templatezeppelin-env.sh

>cpzeppelin-site.xml.templatezeppelin-site.xml

>vizeppelin-env.sh

exportMASTER="yarn-client"

exportHADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"

exportSPARK_HOME="/opt/spark"

.${SPARK_HOME}/conf/spark-env.sh

exportZEPPELIN_CLASSPATH="${SPARK_CLASSPATH}"

SetUpSpark

>cpspark-env.sh.templatespark-env.sh

>vispark-env.sh

exportHADOOP_CONF_DIR="/opt/hadoop/etc/hadoop"

exportSPARK_WORKER_MEMORY=768m

exportSPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"

exportUSER=carl

Rebuildandsetupthezeppelin.

>mvncleanpackage-Pspark-1.5-Dspark.version=1.5.0-Dhadoop.version=2.7.0-Phadoop-2.6-Pyarn-DskipTests-Pbuild-distr

Thefinalgzfilewillbehere:

/home/carl/install/incubator-zeppelin-0.6.0/zeppelin-distribution/target

>mvzeppelin-0.6.0-incubating-SNAPSHOT/home/carl/tool/zeppelin-0.6.0

>sudoln-s/opt/zeppelin-0.6.0/opt/zeppelin

StarttheServer

>bin/zeppelin-daemon.shstart

VisittheZeppelin

http://ubuntu-master:8080/#/

Exception:

Foundbothspark.driver.extraJavaOptionsandSPARK_JAVA_OPTS.Useonlytheformer.

Solution:

ZeppelinConfiguration

exportZEPPELIN_JAVA_OPTS="-Dspark.akka.frameSize=100-Dspark.jars=/home/hadoop/spark-seed-assembly-0.0.1.jar"

SparkConfiguration

exportSPARK_DAEMON_JAVA_OPTS="-verbose:gc-XX:+PrintGCDetails-XX:+PrintGCDateStamps-XX:+UseConcMarkSweepGC-XX:CMSInitiatingOccupancyFraction=70-XX:MaxHeapFreeRatio=70"

exportSPARK_LOCAL_DIRS=/opt/spark

exportSPARK_LOG_DIR=/var/log/apps

exportSPARK_CLASSPATH=“/opt/spark/conf:/home/hadoop/conf:/opt/spark/classpath/emr/*:/opt/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar"

References:

http://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression

zeppelin

https://github.com/apache/incubator-zeppelin

hadoop

相关推荐