Prediction(4)Logistic Regression - Local Cluster Set Up
Prediction(4)LogisticRegression-LocalClusterSetUp
1.TrytoSetUpHadoop
Downloadtherightversion
>wgethttp://apache.spinellicreations.com/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
Placeitintherightplaceandsoftlinkthefile
>hadoopversion
Hadoop2.7.1
Subversionhttps://git-wip-us.apache.org/repos/asf/hadoop.git-r15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiledbyjenkinson2015-06-29T06:04Z
Compiledwithprotoc2.5.0
Fromsourcewithchecksumfc0a1a23fc1868e4d5ee7fa2b28a58a
SetuptheCluster
>mkdir/opt/hadoop/temp
Configcore-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ubuntu-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/temp</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
>mkdir/opt/hadoop/dfs
>mkdir/opt/hadoop/dfs/name
>mkdir/opt/hadoop/dfs/data
Configurehdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>ubuntu-master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
>mvmapred-site.xml.templatemapred-site.xml
Configuremapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ubuntu-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ubuntu-master:19888</value>
</property>
</configuration>
Configuretheyarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>ubuntu-master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ubuntu-master:8088</value>
</property>
</configuration>
Configureslaves
ubuntu-dev1
ubuntu-dev2
ubuntu-dev3
Preparethe3slavemachinesifneeded.
>mkdir~/.ssh
>vi~/.ssh/authorized_keys
Copythekeysthere,thecontentisfromcat~/.ssh/id_rsa.pub
scpallthefilestoallslavesmachines.
Thesamecommandwillstarthadoop
7.Hadoophdfsandyarn
cd/opt/hadoop
sbin/start-dfs.sh
sbin/start-yarn.sh
visitthepage
http://ubuntu-master:50070/dfshealth.html#tab-overview
http://ubuntu-master:8088/cluster
ErrorMessage:
>sbin/start-dfs.sh
Startingnamenodeson[ubuntu-master]
ubuntu-master:Error:JAVA_HOMEisnotsetandcouldnotbefound.
ubuntu-dev1:Error:JAVA_HOMEisnotsetandcouldnotbefound.
ubuntu-dev2:Error:JAVA_HOMEisnotsetandcouldnotbefound.
Solution:
>vihadoop-env.sh
exportJAVA_HOME="/usr/lib/jvm/java-8-oracle"
ErrorMessage:
2015-09-3019:39:49,482INFOorg.apache.hadoop.hdfs.server.common.Storage:Lockon/opt/hadoop/dfs/name/in_use.lockacquiredbynodename3017@ubuntu-master
2015-09-3019:39:49,487WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:Encounteredexceptionloadingfsimage
java.io.IOException:NameNodeisnotformatted.
atorg.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:225)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
Solution:
hdfsnamenode-format
Cool,allthingsareupandrunningforyarncluster.
2.TrytoSetUpSpark1.5.0
FetchthelatestSpark
>wgethttp://apache.mirrors.ionfish.org/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz
Unzipandplacethatintherightworkingdirectory.
3.TrytoSetUpZeppelin
Fetchthesourcecodesfirst.
>gitclonehttps://github.com/apache/incubator-zeppelin.git
>npminstall-ggrunt-cli
>grunt--version
grunt-cliv0.1.13
>mvncleanpackage-Pspark-1.5-Dspark.version=1.5.0-Dhadoop.version=2.7.0-Phadoop-2.6-Pyarn-DskipTests
Exception:
[ERROR]Failedtoexecutegoalcom.github.eirslett:frontend-maven-plugin:0.0.23:grunt(gruntbuild)onprojectzeppelin-web:Failedtoruntask:'grunt--no-color'failed.(errorcode3)->[Help1]
INFO[launcher]:TryingtostartPhantomJSagain(1/2).
ERROR[launcher]:CannotstartPhantomJS
INFO[launcher]:TryingtostartPhantomJSagain(2/2).
ERROR[launcher]:CannotstartPhantomJS
ERROR[launcher]:PhantomJSfailed2times(cannotstart).Givingup.
Warning:Task"karma:unit"failed.Use--forcetocontinue.
Solution:
>cd/home/carl/install/incubator-zeppelin/zeppelin-web
>mvncleaninstall
Igetmoreexceptionsindetail.ItshowsthatthePhantomJSisnotinstalled.
InstallPhantomJS
BuildownPhantomJSfromsource
http://phantomjs.org/build.html
Orfindanolderversionfromhere
https://code.google.com/p/phantomjs/downloads/list
Downloadtherightversion
>wgethttps://phantomjs.googlecode.com/files/phantomjs-1.9.2-linux-x86_64.tar.bz2
>bzip2-dphantomjs-1.9.2-linux-x86_64.tar.bz2
>tar-xvfphantomjs-1.9.2-linux-x86_64.tar
Movetotheproperdirectory.Addtopath.Verifyinstallation.
ErrorException:
phantomjs--version
phantomjs:errorwhileloadingsharedlibraries:libfontconfig.so.1:cannotopensharedobjectfile:Nosuchfileordirectory
Solution:
>sudoapt-getinstalllibfontconfig
Itworks.
>phantomjs--version
1.9.2
BuildSuccess.
4.ConfigureSparkandZeppelin
SetUpZeppelin
>cpzeppelin-env.sh.templatezeppelin-env.sh
>cpzeppelin-site.xml.templatezeppelin-site.xml
>vizeppelin-env.sh
exportMASTER="yarn-client"
exportHADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
exportSPARK_HOME="/opt/spark"
.${SPARK_HOME}/conf/spark-env.sh
exportZEPPELIN_CLASSPATH="${SPARK_CLASSPATH}"
SetUpSpark
>cpspark-env.sh.templatespark-env.sh
>vispark-env.sh
exportHADOOP_CONF_DIR="/opt/hadoop/etc/hadoop"
exportSPARK_WORKER_MEMORY=768m
exportSPARK_JAVA_OPTS="-Dbuild.env=lmm.sparkvm"
exportUSER=carl
Rebuildandsetupthezeppelin.
>mvncleanpackage-Pspark-1.5-Dspark.version=1.5.0-Dhadoop.version=2.7.0-Phadoop-2.6-Pyarn-DskipTests-Pbuild-distr
Thefinalgzfilewillbehere:
/home/carl/install/incubator-zeppelin-0.6.0/zeppelin-distribution/target
>mvzeppelin-0.6.0-incubating-SNAPSHOT/home/carl/tool/zeppelin-0.6.0
>sudoln-s/opt/zeppelin-0.6.0/opt/zeppelin
StarttheServer
>bin/zeppelin-daemon.shstart
VisittheZeppelin
http://ubuntu-master:8080/#/
Exception:
Foundbothspark.driver.extraJavaOptionsandSPARK_JAVA_OPTS.Useonlytheformer.
Solution:
ZeppelinConfiguration
exportZEPPELIN_JAVA_OPTS="-Dspark.akka.frameSize=100-Dspark.jars=/home/hadoop/spark-seed-assembly-0.0.1.jar"
SparkConfiguration
exportSPARK_DAEMON_JAVA_OPTS="-verbose:gc-XX:+PrintGCDetails-XX:+PrintGCDateStamps-XX:+UseConcMarkSweepGC-XX:CMSInitiatingOccupancyFraction=70-XX:MaxHeapFreeRatio=70"
exportSPARK_LOCAL_DIRS=/opt/spark
exportSPARK_LOG_DIR=/var/log/apps
exportSPARK_CLASSPATH=“/opt/spark/conf:/home/hadoop/conf:/opt/spark/classpath/emr/*:/opt/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar"
References:
http://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression
zeppelin
https://github.com/apache/incubator-zeppelin
hadoop