Data Solution 2019(3)Run Zeppelin in Single Docker
DataSolution2019(3)RunZeppelininSingleDocker
ExceptionwhenStartHDFSinDocker
ERROR:Attemptingtooperateonhdfsnamenodeasroot
ERROR:butthereisnoHDFS_NAMENODE_USERdefined.Abortingoperation.
Solution:
AddthistoENVsolvetheproblem.
exportHDFS_NAMENODE_USER="root"
exportHDFS_DATANODE_USER="root"
exportHDFS_SECONDARYNAMENODE_USER="root"
exportYARN_RESOURCEMANAGER_USER="root"
exportYARN_NODEMANAGER_USER="root"
ExceptionwhenStartHDFSinDocker
Startingnamenodeson[0.0.0.0]
0.0.0.0:/tool/hadoop-3.2.0/bin/../libexec/hadoop-functions.sh:line982:ssh:commandnotfound
Startingdatanodes
localhost:/tool/hadoop-3.2.0/bin/../libexec/hadoop-functions.sh:line982:ssh:commandnotfound
Startingsecondarynamenodes[140815a59b06]
140815a59b06:/tool/hadoop-3.2.0/bin/../libexec/hadoop-functions.sh:line982:ssh:commandnotfound
Solution:
https://stackoverflow.com/questions/40801417/installing-ssh-in-the-docker-containers
InstallandStartSSHServer
RUNapt-getinstall-yopenssh-server
RUNmkdir/var/run/sshd
RUNssh-keygen-q-trsa-N''-f/root/.ssh/id_rsa
RUNcat~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys
#startsshservice
nohup/usr/sbin/sshd-D>/dev/stdout&
ExceptionwhenStartHDFS
ERROR:JAVA_HOMEisnotsetandcouldnotbefound
Solution:
AddJAVA_HOMEinHadoop-env.sh
exportJAVA_HOME="/usr/lib/jvm/java-8-oracle”
ItseemsHDFSisrunningfineinDocker.
ButfromtheUI,IgeterrorlikethisfromUIhttp://localhost:9870/dfshealth.html#tab-overview
Exception:
Permissiondenied:user=dr.who,access=WRITE,inode="/":root:supergroup:drwxr-xr-x
Solution:
https://stackoverflow.com/questions/11593374/permission-denied-at-hdfs
SincethisismylocalDocker,Iwilljustdisablethepermissioninpdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
CheckDockerStats
>dockerstats
Mymemoryisonly2G,toosmall,maybeCPUisnotpowerenoughaswell.
CONTAINERIDNAMECPU%MEMUSAGE/LIMITMEM%NETI/OBLOCKI/OPIDS
382b064708ecubuntu-spark-1.00.64%1.442GiB/1.952GiB73.89%216kB/437kB255MB/10.1MB256
>nproc
4
MaybeCPUisok
IamusingMAC,sothewaytoincreasethememoryistoopenthetool
DockerDesktop—>References—>Advanced—>CPUs4,Memory2GB,Swap1.0GB
https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container
CleanupmyDockerImageswhichIamnotusinganymore
>dockerimages|grepnone|awk'{print$3;}'|xargsdockerrmi
OfficialWebsite
https://hub.docker.com/r/apache/zeppelin/dockerfile
FinallyImadeitworking.
conf/core-site.xml
<?xmlversion="1.0"encoding="UTF-8"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
</configuration>
conf/hadoop-env.sh
exportJAVA_HOME="/usr/lib/jvm/java-8-oracle”
exportHADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname-s)}
case${HADOOP_OS_TYPE}in
Darwin*)
exportHADOOP_OPTS="${HADOOP_OPTS}-Djava.security.krb5.realm="
exportHADOOP_OPTS="${HADOOP_OPTS}-Djava.security.krb5.kdc="
exportHADOOP_OPTS="${HADOOP_OPTS}-Djava.security.krb5.conf="
;;
esac
conf/hdfs-site.xml
<?xmlversion="1.0"encoding="UTF-8"?>
<?xml-stylesheettype="text/xsl"href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
conf/spark-env.sh
HADOOP_CONF_DIR=/tool/hadoop/etc/hadoop
Needtoputoutzeppelin/confandzeppelin/notebookoutsideandmappingtodockerapplicationtosavedata.
ThisistheimportantDockerfile
#Runakafkaserverside
#PreparetheOS
FROMubuntu:16.04
MAINTAINERCarlLuo<[email protected]>
ENVDEBIAN_FRONTENDnoninteractive
ENVJAVA_HOME/usr/lib/jvm/java-8-oracle
ENVLANGen_US.UTF-8
ENVLC_ALLen_US.UTF-8
RUNapt-get-qqupdate
RUNapt-get-qqydist-upgrade
#Preparethedenpendencies
RUNapt-getinstall-qywgetunzipvim
RUNapt-getinstall-qyiputils-ping
#InstallSUNJAVA
RUNapt-getupdate&&\
apt-getinstall-y--no-install-recommendslocales&&\
locale-genen_US.UTF-8&&\
apt-getdist-upgrade-y&&\
apt-get--purgeremoveopenjdk*&&\
echo"oracle-java8-installershared/accepted-oracle-license-v1-1selecttrue"|debconf-set-selections&&\
echo"debhttp://ppa.launchpad.net/webupd8team/java/ubuntuxenialmain">/etc/apt/sources.list.d/webupd8team-java-trusty.list&&\
apt-keyadv--keyserverkeyserver.ubuntu.com--recv-keysEEA14886&&\
apt-getupdate&&\
apt-getinstall-y--no-install-recommendsoracle-java8-installeroracle-java8-set-default&&\
apt-getcleanall
#Prepareforhadoopandspark
RUNapt-getinstall-yopenssh-server
RUNmkdir/var/run/sshd
RUNssh-keygen-q-trsa-N''-f/root/.ssh/id_rsa
RUNcat~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys
RUNmkdir/tool/
WORKDIR/tool/
#addthesoftwarehadoop
ADDinstall/hadoop-3.2.0.tar.gz/tool/
RUNln-s/tool/hadoop-3.2.0/tool/hadoop
ADDconf/core-site.xml/tool/hadoop/etc/hadoop/
ADDconf/hdfs-site.xml/tool/hadoop/etc/hadoop/
ADDconf/hadoop-env.sh/tool/hadoop/etc/hadoop/
#addthesoftwarespark
ADDinstall/spark-2.4.0-bin-hadoop2.7.tgz/tool/
RUNln-s/tool/spark-2.4.0-bin-hadoop2.7/tool/spark
ADDconf/spark-env.sh/tool/spark/conf/
#addthesoftwarezeppelin
ADDinstall/zeppelin-0.8.1-bin-all.tgz/tool/
RUNln-s/tool/zeppelin-0.8.1-bin-all/tool/zeppelin
#setuptheapp
EXPOSE9000987080804040
RUNmkdir-p/app/
ADDstart.sh/app/
WORKDIR/app/
CMD["./start.sh”]
ThisistheMakefilewhichwillmakeitworking
IMAGE=sillycat/public
TAG=ubuntu-spark-1.0
NAME=ubuntu-spark-1.0
prepare:
wgethttp://mirror.olnevhost.net/pub/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz-Pinstall/
wgethttp://ftp.wayne.edu/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz-Pinstall/
wgethttp://apache.claz.org/zeppelin/zeppelin-0.8.1/zeppelin-0.8.1-bin-all.tgz-Pinstall/
docker-context:
build:docker-context
dockerbuild-t$(IMAGE):$(TAG).
run:
dockerrun-d-p9870:9870-p9000:9000-p8080:8080-p4040:4040-v$(shellpwd)/zeppelin/notebook:/tool/zeppelin/notebook-v$(shellpwd)/zeppelin/conf:/tool/zeppelin/conf--name$(NAME)$(IMAGE):$(TAG)
debug:
dockerrun-ti-p9870:9870-p9000:9000-p8080:8080-p4040:4040-v$(shellpwd)/zeppelin/notebook:/tool/zeppelin/notebook-v$(shellpwd)/zeppelin/conf:/tool/zeppelin/conf--name$(NAME)$(IMAGE):$(TAG)/bin/bash
clean:
dockerstop${NAME}
dockerrm${NAME}
logs:
dockerlogs${NAME}
publish:
dockerpush${IMAGE}
Thisisthestart.shtostarttheapplication
#!/bin/sh-ex
#prepareENV
exportHDFS_NAMENODE_USER="root"
exportHDFS_DATANODE_USER="root"
exportHDFS_SECONDARYNAMENODE_USER="root"
exportYARN_RESOURCEMANAGER_USER="root"
exportYARN_NODEMANAGER_USER="root"
exportSPARK_HOME="/tool/spark"
#startsshservice
nohup/usr/sbin/sshd-D>/dev/stdout&
#starttheservice
cd/tool/hadoop
bin/hdfsnamenode-format
sbin/start-dfs.sh
cd/tool/zeppelin
bin/zeppelin.sh
Afterthat,wecanvisitthis3UItoworkonourdata
###Hadoop3.2.0Spark2.4.0Zeppelin0.8.1
###HDFS
http://localhost:9870/explorer.html#/
###ZeppelinUI
http://localhost:8080/
###AfteryouRuntheFirstDemoJOB,SparkJobsUI
http://localhost:4040/stages/
References:
https://stackoverflow.com/questions/48129029/hdfs-namenode-user-hdfs-datanode-user-hdfs-secondarynamenode-user-not-defined
https://www.cnblogs.com/sylar5/p/9169090.html
https://www.jianshu.com/p/b49712bbe044
https://stackoverflow.com/questions/40801417/installing-ssh-in-the-docker-containers
https://stackoverflow.com/questions/27504187/ssh-key-generation-using-dockerfile
https://github.com/twang2218/docker-zeppelin