伪分布式运行MapReduce(集群配置,log日志和namenode格式化)
目录
集群的启动和配置
log日志和namenode为何不能一直格式化?
操作集群(上传,下载,执行MapReduce,查询)
集群的启动和配置
#1,进入/opt/module/hadoop-2.7.2/etc/hadoop目录,配置hadoop-env.sh
[isea@hadoop104 hadoop]$ vim hadoop-env.sh
*
*
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
export JAVA_HOME=/opt/module/jdk1.8.0_144
*
*
#2,配置core-site.xml
[isea@hadoop104 hadoop]$ vim core-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop104:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
</configuration>
#3,配置hdfs-site.xml
[isea@hadoop104 hadoop]$ vim hdfs-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
#4,格式化namenode,(第一次启动之前格式化,以后就不用了)
[isea@hadoop104 hadoop]$ hdfs namenode -format
18/11/14 20:07:27 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop104/192.168.1.104
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.2
*
*
18/11/14 20:07:28 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop104/192.168.1.104
************************************************************/
#5,分别启动namenode 和 datanode,并查看是否启动成功
[isea@hadoop104 hadoop]$ hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-namenode-hadoop104.out
[isea@hadoop104 hadoop]$ hadoop-daemon.sh start datanode
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-datanode-hadoop104.out
[isea@hadoop104 hadoop]$ jps
3427 NameNode
3517 DataNode
3598 Jps
到此,完成集群的配置和启动工作
接下来,我们访问这个网址:
http://hadoop104:50070/dfshealth.html#tab-overview
会出现如下的内容
log日志和namenode为何不能一直格式化?
#1,log日志:
[isea@hadoop104 logs]$ pwd
/opt/module/hadoop-2.7.2/logs
[isea@hadoop104 logs]$ ll
总用量 60
-rw-rw-r--. 1 isea isea 23848 11月 14 20:10 hadoop-isea-datanode-hadoop104.log
-rw-rw-r--. 1 isea isea 715 11月 14 20:10 hadoop-isea-datanode-hadoop104.out
-rw-rw-r--. 1 isea isea 27519 11月 14 20:10 hadoop-isea-namenode-hadoop104.log
-rw-rw-r--. 1 isea isea 715 11月 14 20:10 hadoop-isea-namenode-hadoop104.out
-rw-rw-r--. 1 isea isea 0 11月 14 20:10 SecurityAuth-isea.audit
在启动namenode 和 datanode的过程中会在hadoop目录下产生log文件夹,在log文件夹中会产生日志文件,
和尾缀为out的文件 和 一个安全认证的文件。
#2,为什么不能一直格式化namenode?
[isea@hadoop104 current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/data/current
[isea@hadoop104 current]$ ll
总用量 8
drwx------. 4 isea isea 4096 11月 14 20:10 BP-847571129-192.168.1.104-1542197248436
-rw-rw-r--. 1 isea isea 229 11月 14 20:10 VERSION
[isea@hadoop104 current]$ cat VERSION
#Wed Nov 14 20:10:52 CST 2018
storageID=DS-305b15b0-96c1-407c-b58e-1beb65922151
clusterID=CID-8eeb5d53-e49f-4de6-9e05-387a7eb1472f
cTime=0
datanodeUuid=ea5794eb-6929-40b7-b8c3-aad970d72c29
storageType=DATA_NODE
layoutVersion=-56
[isea@hadoop104 current]$
格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。
所以,格式NameNode时,一定要先删除data数据和log日志,然后再格式化NameNode
操作集群(上传,下载,执行MapReduce,查询)
#1,在HDFS文件系统上创建一个input文件夹,并准备要上传的数据
[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/isea/input
[isea@hadoop104 hadoop-2.7.2]$ vim wcinput/wc.input
you know that i sea you
sea you
isea you
isea
i sea you
#2,上传测试数据到HDFS文件系统,并检查是否上传成功
[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -put wcinput/wc.input /user/isea/input/
[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -ls /user/isea/input/
Found 1 items
-rw-r--r-- 1 isea supergroup 57 2018-11-14 20:45 /user/isea/input/wc.input
#3, 运行MapReduce程序,并检查结果
[isea@hadoop104 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/isea/input/ /user/isea/output
[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -cat /user/isea/output/*
i2
isea2
know1
sea3
that1
you5
集训验证操作集群,从集群中下载文件,最后删除HDFS的输出文件
[isea@hadoop104 hadoop-2.7.2]$ mkdir wcoutput
[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -get /user/isea/output/part-r-00000 ./wcoutput/
[isea@hadoop104 hadoop-2.7.2]$ cd wcoutput/
[isea@hadoop104 wcoutput]$ ll
总用量 4
-rw-r--r--. 1 isea isea 37 11月 14 21:21 part-r-00000
[isea@hadoop104 wcoutput]$ cat part-r-00000
i2
isea2
know1
sea3
that1
you5
[isea@hadoop104 wcoutput]$ hdfs dfs -rm -r /user/isea/output
18/11/14 21:26:27 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/isea/output
此外,我们还可以在浏览器端验证结果:
http://hadoop104:50070/explorer.html#/