伪分布式运行MapReduce(集群配置,log日志和namenode格式化)

目录

集群的启动和配置

log日志和namenode为何不能一直格式化?

操作集群(上传,下载,执行MapReduce,查询)

集群的启动和配置

#1,进入/opt/module/hadoop-2.7.2/etc/hadoop目录,配置hadoop-env.sh

[isea@hadoop104 hadoop]$ vim hadoop-env.sh

*

*

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are

# optional. When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.

# The java implementation to use.

export JAVA_HOME=/opt/module/jdk1.8.0_144

*

*

#2,配置core-site.xml

[isea@hadoop104 hadoop]$ vim core-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>

<!-- 指定HDFS中NameNode的地址 -->

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoop104:9000</value>

</property>

<!-- 指定Hadoop运行时产生文件的存储目录 -->

<property>

<name>hadoop.tmp.dir</name>

<value>/opt/module/hadoop-2.7.2/data/tmp</value>

</property>

</configuration>

#3,配置hdfs-site.xml

[isea@hadoop104 hadoop]$ vim hdfs-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>

<!-- 指定HDFS副本的数量 -->

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

#4,格式化namenode,(第一次启动之前格式化,以后就不用了)

[isea@hadoop104 hadoop]$ hdfs namenode -format

18/11/14 20:07:27 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = hadoop104/192.168.1.104

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 2.7.2

*

*

18/11/14 20:07:28 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hadoop104/192.168.1.104

************************************************************/

#5,分别启动namenode 和 datanode,并查看是否启动成功

[isea@hadoop104 hadoop]$ hadoop-daemon.sh start namenode

starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-namenode-hadoop104.out

[isea@hadoop104 hadoop]$ hadoop-daemon.sh start datanode

starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-datanode-hadoop104.out

[isea@hadoop104 hadoop]$ jps

3427 NameNode

3517 DataNode

3598 Jps

到此,完成集群的配置和启动工作

接下来,我们访问这个网址:

http://hadoop104:50070/dfshealth.html#tab-overview

伪分布式运行MapReduce(集群配置,log日志和namenode格式化)

会出现如下的内容

log日志和namenode为何不能一直格式化?

#1,log日志:

[isea@hadoop104 logs]$ pwd

/opt/module/hadoop-2.7.2/logs

[isea@hadoop104 logs]$ ll

总用量 60

-rw-rw-r--. 1 isea isea 23848 11月 14 20:10 hadoop-isea-datanode-hadoop104.log

-rw-rw-r--. 1 isea isea 715 11月 14 20:10 hadoop-isea-datanode-hadoop104.out

-rw-rw-r--. 1 isea isea 27519 11月 14 20:10 hadoop-isea-namenode-hadoop104.log

-rw-rw-r--. 1 isea isea 715 11月 14 20:10 hadoop-isea-namenode-hadoop104.out

-rw-rw-r--. 1 isea isea 0 11月 14 20:10 SecurityAuth-isea.audit

在启动namenode 和 datanode的过程中会在hadoop目录下产生log文件夹,在log文件夹中会产生日志文件,

和尾缀为out的文件 和 一个安全认证的文件。

#2,为什么不能一直格式化namenode?

[isea@hadoop104 current]$ pwd

/opt/module/hadoop-2.7.2/data/tmp/dfs/data/current

[isea@hadoop104 current]$ ll

总用量 8

drwx------. 4 isea isea 4096 11月 14 20:10 BP-847571129-192.168.1.104-1542197248436

-rw-rw-r--. 1 isea isea 229 11月 14 20:10 VERSION

[isea@hadoop104 current]$ cat VERSION

#Wed Nov 14 20:10:52 CST 2018

storageID=DS-305b15b0-96c1-407c-b58e-1beb65922151

clusterID=CID-8eeb5d53-e49f-4de6-9e05-387a7eb1472f

cTime=0

datanodeUuid=ea5794eb-6929-40b7-b8c3-aad970d72c29

storageType=DATA_NODE

layoutVersion=-56

[isea@hadoop104 current]$

格式化NameNode,会产生新的集群id,导致NameNode和DataNode的集群id不一致,集群找不到已往数据。

所以,格式NameNode时,一定要先删除data数据和log日志,然后再格式化NameNode

操作集群(上传,下载,执行MapReduce,查询)

#1,在HDFS文件系统上创建一个input文件夹,并准备要上传的数据

[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/isea/input

[isea@hadoop104 hadoop-2.7.2]$ vim wcinput/wc.input

you know that i sea you

sea you

isea you

isea

i sea you

#2,上传测试数据到HDFS文件系统,并检查是否上传成功

[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -put wcinput/wc.input /user/isea/input/

[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -ls /user/isea/input/

Found 1 items

-rw-r--r-- 1 isea supergroup 57 2018-11-14 20:45 /user/isea/input/wc.input

#3, 运行MapReduce程序,并检查结果

[isea@hadoop104 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/isea/input/ /user/isea/output

[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -cat /user/isea/output/*

i2

isea2

know1

sea3

that1

you5

集训验证操作集群,从集群中下载文件,最后删除HDFS的输出文件

[isea@hadoop104 hadoop-2.7.2]$ mkdir wcoutput

[isea@hadoop104 hadoop-2.7.2]$ hdfs dfs -get /user/isea/output/part-r-00000 ./wcoutput/

[isea@hadoop104 hadoop-2.7.2]$ cd wcoutput/

[isea@hadoop104 wcoutput]$ ll

总用量 4

-rw-r--r--. 1 isea isea 37 11月 14 21:21 part-r-00000

[isea@hadoop104 wcoutput]$ cat part-r-00000

i2

isea2

know1

sea3

that1

you5

[isea@hadoop104 wcoutput]$ hdfs dfs -rm -r /user/isea/output

18/11/14 21:26:27 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /user/isea/output

此外,我们还可以在浏览器端验证结果:

http://hadoop104:50070/explorer.html#/

伪分布式运行MapReduce(集群配置,log日志和namenode格式化)

相关推荐