Spark/Hadoop/Zeppelin Upgrade(1)
Spark/Hadoop/ZeppelinUpgrade(1)
1InstallJDK1.8Manually
>wget--no-cookies--no-check-certificate--header"Cookie:gpw_e24=http%3A%2F%2Fwww.oracle.com%2F;oraclelicense=accept-securebackup-cookie""http://download.oracle.com/otn-pub/java/jdk/8u77-b03/jdk-8u77-linux-x64.tar.gz"
Unzipandplaceitinrightplace.AddbintoPATH.
>java-version
javaversion"1.8.0_77"
2MAVENInstallation
>wgethttp://apache.arvixe.com/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
Unzipandplaceitintherightplace,addbintoPATH
>mvn--version
ApacheMaven3.3.9(bb52d8502b132ec0a5a3f4c09453c07478323dc5;2015-11-10T10:41:47-06:00)
Mavenhome:/opt/maven
Javaversion:1.8.0_77,vendor:OracleCorporation
3ProtocInstallation
>gitclonehttps://github.com/google/protobuf.git
>sudoapt-getinstallunzip
>sudoapt-getinstallautoconf
>sudoapt-getinstallbuild-essentiallibtool
configuremakeandmakeinstall,addingtothePATH.
>protoc--version
libprotoc3.0.0
ErrorException:
'libprotoc3.0.0',expectedversionis'2.5.0'
Solution:
Switchto2.5.0
>gitcheckouttags/v2.5.0
>./autogen.sh
>./configure--prefix=/home/carl/tool/protobuf-2.5.0
>protoc--version
libprotoc2.5.0
4HADOOPInstallation
>wgethttp://mirrors.ibiblio.org/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2-src.tar.gz
>mvnpackage-Pdist,native-DskipTests-Dtar
ErrorMessage:
Cannotrunprogram"cmake"
Solution:
>sudoapt-getinstallcmake
ErrorMessage:
AnAntBuildExceptionhasoccured:execreturned:1
Solution:
Trytogetmoredetail
>mvnpackage-Pdist,native-DskipTests-Dtar-e
>mvnpackage-Pdist,native-DskipTests-Dtar-X
>sudoapt-getinstallzlib1g-dev
>sudoapt-getinstalllibssl-dev
Butitisnotworking.
So,switchtousethebinaryinstead.
>wgethttp://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
ConfiguretheJAVA_HOME
exportJAVA_HOME="/opt/jdk"
PATH="/opt/hadoop/bin:$PATH"
Formatthenodewithcommand
>hdfsnamenode-format
SetupSSHonubuntu-master,ubuntu-dev1,ubuntu-dev2
>ssh-keygen-trsa
>cat~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys
Findtheconfigurationfile/opt/hadoop/etc/hadoop/hadoop-env.sh
exportJAVA_HOME="/opt/jdk"
Followdocumentsandmaketheconfigurations.
CommandtostartDFS
>sbin/start-dfs.sh
ErrorMessage:
java.io.IOException:Incorrectconfiguration:namenodeaddressdfs.namenode.servicerpc-addressordfs.namenode.rpc-addressisnotconfigured.
atorg.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:875)
atorg.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1125)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:428)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2370)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2257)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2304)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2481)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2505)
Solution:
Configurethesamexmlinmasteraswell.
Changetheslavesfiletopointtoubuntu-dev1andubuntu-dev2.
ErrorMessage:
2016-03-2813:31:14,371WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:Encounteredexceptionloadingfsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:Directory/home/carl/tool/hadoop-2.7.2/dfs/nameisinaninconsistentstate:storagedirectorydoe
snotexistorisnotaccessible.
atorg.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
atorg.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
Solution:
MakesurewehavehavetheDFSdirectory
>mkdir-p/opt/hadoop/dfs/data
>mkdir-p/opt/hadoop/dfs/name
CheckiftheDFSisrunning
>jps
2038SecondaryNameNode
1816NameNode
2169Jps
Visittheconsolepage:
http://ubuntu-master:50070/dfshealth.html#tab-overview
ErrorMessage:
2016-03-2814:20:16,180WARNorg.apache.hadoop.hdfs.server.datanode.DataNode:Problemconnectingtoserver:ubuntu-master/192.168.56.104:9000
2016-03-2814:20:22,183INFOorg.apache.hadoop.ipc.Client:Retryingconnecttoserver:ubuntu-master/192.168.56.104:9000.Alreadytried0time(s);retrypolicyisRetryUpToMaximumCountWithFixedSleep(maxRetries=10,sleepTime=1000MILLISECONDS)
>telnetubuntu-master9000
Trying192.168.56.104...
telnet:Unabletoconnecttoremotehost:Connectionrefused
Solution:
Icantalentthatonubuntu-master,butnotonubuntu-dev1andubuntu-dev2.Iguessitisfirewallproblem.
>sudoufwdisable
Firewallstoppedanddisabledonsystemstartup
ThenIalsodeletetheIPV6relatedthingsin/etc/hosts.
>cat/etc/hosts
127.0.0.1localhost
127.0.1.1ubuntu-dev2.ec2.internal
192.168.56.104ubuntu-master
192.168.56.105ubuntu-dev1
192.168.56.106ubuntu-dev2
192.168.56.107ubuntu-build
StartYARNcluster
>sbin/start-yarn.sh
http://ubuntu-master:8088/cluster
5SparkInstallation
downloadthelatestsparkversion
>wgethttp://mirror.nexcess.net/apache/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz
Unzipandplaceintherightplace.
http://spark.apache.org/docs/latest/running-on-yarn.html
>catconf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
WecanBuildSpark
http://spark.apache.org/docs/latest/building-spark.html
>build/mvn-Pyarn-Phadoop-2.7-Dhadoop.version=2.7.2-Phive-DskipTestscleanpackage
[WARNING]Therequestedprofile"hadoop-2.7"couldnotbeactivatedbecauseitdoesnotexist.
Thatmeansweneedtousehadoop2.6.4
6InstallNodeJS
>wgethttps://nodejs.org/dist/v4.4.0/node-v4.4.0.tar.gz
>sudoln-s/home/carl/tool/node-v4.4.0/opt/node-v4.4.0
7ZeppelinInstallation
Checkgitversion
>git--version
gitversion1.9.1
JavaVersion
>java-version
javaversion"1.8.0_77"
ChecknodeJSversion
>node--version&&npm--version
v4.4.0
2.14.20
Installdependencies
>sudoapt-getinstalllibfontconfig
CheckMAVEN
>mvn--version
ApacheMaven3.3.9(bb52d8502b132ec0a5a3f4c09453c07478323dc5;2015-11-10T10:41:47-06:00)
AddMAVENparameters
exportMAVEN_OPTS="-Xmx2g-XX:MaxPermSize=1024m"
>mvncleanpackage-DskipTests-Pspark-1.6-Dspark.version=1.6.1-Phadoop-2.6-Dhadoop.version=2.6.
References:
zeppelin
https://github.com/apache/incubator-zeppelin/blob/master/README.md