hadoop 安装和运维
一、基本概念
namenode:dfs的目录、数据块等元数据
datanode:具体的数据
journalnode namenodez 之间元数据的同步
dfs:distributed file system
mapred:map reduce
ResourceManager:总入口和总调度(针对一个app)
ApplicationMaster:具体的作业调度(支持非map reduce)
NodeManager:一个节点的管理daemon
container:节点内执行的环境(资源)
Job History Server(api +RPC):收集和展现log信息
WebAppProxy:内部与外部访问间的一个中转
yarn.nodemanager.health-checker.script.path:监控node
Rack Awareness:机架感知,提高调度的性能
二、安装:配置+start
1、配置:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2、保证可以ssh localhost
3、start:
bin/hdfs namenode -format
sbin/start-dfs.sh
sbin/start-yarn.sh
4、url
http://localhost:50070/ # dfs
http://localhost:8088/ # yarn
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/root # 创建用户
$ bin/hdfs dfs -put etc/hadoop input
# 执行jar
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
bin/hdfs dfs -get output output
cat output/*
5、停止
$ sbin/stop-yarn.sh
$ sbin/stop-dfs.sh
三、命令
hadoop archive -archiveName zoo.har -p /foo/bar -r 3 /outputdir
hadoop classpath --glob
hadoop jar *.jar # 执行jar
hadoop fs -appendToFile localfile /user/hadoop/hadoopfile # fs命令
四、文件系统常用命令
bin/hadoop fs -cat /user/root/output/*
hdfs dfsadmin -disallowSnapshot <path>
hdfs dfs -createSnapshot <path> [<snapshotName>]
hadoop dfs -df /user/hadoop/dir1
bin/hadoop fs -ls /user/root/output/*
五、其他
1、CLI MiniCluster:避免配置,参数化启动一个cluster
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT
2、Rack Awareness 机架感知 需要用脚本扩展来输出 /myrack/myhost