hadoop 安装和运维

一、基本概念

namenode:dfs的目录、数据块等元数据

datanode:具体的数据

journalnode namenodez 之间元数据的同步

dfs:distributed file system

mapred:map reduce

ResourceManager:总入口和总调度(针对一个app)

ApplicationMaster:具体的作业调度(支持非map reduce)

NodeManager:一个节点的管理daemon

container:节点内执行的环境(资源)

Job History Server(api +RPC):收集和展现log信息

WebAppProxy:内部与外部访问间的一个中转

yarn.nodemanager.health-checker.script.path:监控node

Rack Awareness:机架感知,提高调度的性能

二、安装:配置+start

1、配置:

etc/hadoop/core-site.xml:

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

</configuration>

etc/hadoop/mapred-site.xml:

<configuration>

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

etc/hadoop/yarn-site.xml:

<configuration>

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

</configuration>

2、保证可以ssh localhost

3、start:

bin/hdfs namenode -format

sbin/start-dfs.sh

sbin/start-yarn.sh

4、url

http://localhost:50070/  # dfs

http://localhost:8088/ # yarn

$ bin/hdfs dfs -mkdir /user

$ bin/hdfs dfs -mkdir /user/root # 创建用户

$ bin/hdfs dfs -put etc/hadoop input

# 执行jar

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'

bin/hdfs dfs -get output output

cat output/*

5、停止

$ sbin/stop-yarn.sh

$ sbin/stop-dfs.sh

三、命令

hadoop archive -archiveName zoo.har -p /foo/bar -r 3 /outputdir

hadoop classpath --glob 

hadoop jar *.jar # 执行jar

hadoop fs -appendToFile localfile /user/hadoop/hadoopfile # fs命令

四、文件系统常用命令

bin/hadoop fs -cat /user/root/output/*

hdfs dfsadmin -disallowSnapshot <path>

hdfs dfs -createSnapshot <path> [<snapshotName>]

hadoop dfs -df /user/hadoop/dir1

bin/hadoop fs -ls /user/root/output/*

五、其他

1、CLI MiniCluster:避免配置,参数化启动一个cluster

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT

2、Rack Awareness 机架感知 需要用脚本扩展来输出 /myrack/myhost

相关推荐