[转发]hadoop 默认参数

转发:http://myext.cn/other/56013.html

1获取默认配置

配置hadoop,主要是配置core-site.xml,hdfs-site.xml,mapred-site.xml三个配置文件,默认下来,这些配置文件都是空的,所以很难知道这些配置文件有哪些配置可以生效,上网找的配置可能因为各个hadoop版本不同,导致无法生效。浏览更多的配置,有两个方法:

1.选择相应版本的hadoop,下载解压后,搜索*.xml,找到core-default.xml,hdfs-default.xml,mapred-default.xml,这些就是默认配置,可以参考这些配置的说明和key,配置hadoop集群。

2.浏览apache官网,三个配置文件链接如下:

http://hadoop.apache.org/common/docs/current/core-default.html

http://hadoop.apache.org/common/docs/current/hdfs-default.html

http://hadoop.apache.org/common/docs/current/mapred-default.html

这里是浏览hadoop当前版本号的默认配置文件,其他版本号,要另外去官网找。其中第一个方法找到默认的配置是最好的,因为每个属性都有说明,可以直接使用。另外,core-site.xml是全局配置,hdfs-site.xml和mapred-site.xml分别是hdfs和mapred的局部配置。

2常用的端口配置

2.1HDFS端口

参数

描述

默认

配置文件

例子值

fs.default.namenamenode

namenodeRPC交互端口

8020

core-site.xml

hdfs://master:8020/

dfs.http.address

NameNodeweb管理端口

50070

hdfs-site.xml

0.0.0.0:50070

dfs.datanode.address

datanode 控制端口

50010

hdfs-site.xml

0.0.0.0:50010

dfs.datanode.ipc.address

datanode的RPC服务器地址和端口

50020

hdfs-site.xml

0.0.0.0:50020

dfs.datanode.http.address

datanode的HTTP服务器和端口

50075

hdfs-site.xml

0.0.0.0:50075

2.2MR端口

参数

描述

默认

配置文件

例子值

mapred.job.tracker

job-tracker交互端口

8021

mapred-site.xml

hdfs://master:8021/

job

tracker的web管理端口

50030

mapred-site.xml

0.0.0.0:50030

mapred.task.tracker.http.address

task-tracker的HTTP端口

50060

mapred-site.xml

0.0.0.0:50060

2.3其它端口

参数

描述

默认

配置文件

例子值

dfs.secondary.http.address

secondaryNameNodeweb管理端口

50090

hdfs-site.xml

0.0.0.0:50090

3三个缺省配置参考文件说明

3.1core-default.html

序号

参数名

参数值

参数说明

1

hadoop.tmp.dir

/tmp/hadoop-${user.name}

临时目录设定

2

hadoop.native.lib

true

使用本地hadoop库标识。

3

hadoop.http.filter.initializers

http服务器过滤链设置

4

hadoop.security.group.mapping

org.apache.hadoop.security.ShellBasedUnixGroupsMapping

组内用户的列表的类设定

5

hadoop.security.authorization

false

服务端认证开启

6

hadoop.security.authentication

simple

无认证或认证设置

7

hadoop.security.token.service.use_ip

true

是否开启使用IP地址作为连接的开关

8

hadoop.logfile.size

10000000

日志文件最大为10M

9

hadoop.logfile.count

10

日志文件数量为10个

10

io.file.buffer.size

4096

流文件的缓冲区为4K

11

io.bytes.per.checksum

512

校验位数为512字节

12

io.skip.checksum.errors

false

校验出错后是抛出异常还是略过标识。True则略过。

13

io.compression.codecs

org.apache.hadoop.io.compress.DefaultCodec,

org.apache.hadoop.io.compress.GzipCodec,

org.apache.hadoop.io.compress.BZip2Codec,

org.apache.hadoop.io.compress.SnappyCodec

压缩和解压的方式设置

14

io.serializations

org.apache.hadoop.io.serializer.WritableSerialization

序例化和反序列化的类设定

15

fs.default.name

file:///

缺省的文件URI标识设定。

16

fs.trash.interval

文件废弃标识设定,0为禁止此功能

17

fs.file.impl

org.apache.hadoop.fs.LocalFileSystem

本地文件操作类设置

18

fs.hdfs.impl

org.apache.hadoop.hdfs.DistributedFileSystem

HDFS文件操作类设置

19

fs.s3.impl

org.apache.hadoop.fs.s3.S3FileSystem

S3文件操作类设置

20

fs.s3n.impl

org.apache.hadoop.fs.s3native.NativeS3FileSystem

S3文件本地操作类设置

21

fs.kfs.impl

org.apache.hadoop.fs.kfs.KosmosFileSystem

KFS文件操作类设置.

22

fs.hftp.impl

org.apache.hadoop.hdfs.HftpFileSystem

HTTP方式操作文件设置

23

fs.hsftp.impl

org.apache.hadoop.hdfs.HsftpFileSystem

HTTPS方式操作文件设置

24

fs.webhdfs.impl

org.apache.hadoop.hdfs.web.WebHdfsFileSystem

WEB方式操作文件类设置

25

fs.ftp.impl

org.apache.hadoop.fs.ftp.FTPFileSystem

FTP文件操作类设置

26

fs.ramfs.impl

org.apache.hadoop.fs.InMemoryFileSystem

内存文件操作类设置

27

fs.har.impl

org.apache.hadoop.fs.HarFileSystem

压缩文件操作类设置.

28

fs.har.impl.disable.cache

true

是否缓存har文件的标识设定

29

fs.checkpoint.dir

${hadoop.tmp.dir}/dfs/namesecondary

备份名称节点的存放目前录设置

30

fs.checkpoint.edits.dir

${fs.checkpoint.dir}

备份名称节点日志文件的存放目前录设置

31

fs.checkpoint.period

3600

动态检查的间隔时间设置

32

fs.checkpoint.size

67108864

日志文件大小为64M

33

fs.s3.block.size

67108864

写S3文件系统的块的大小为64M

34

fs.s3.buffer.dir

${hadoop.tmp.dir}/s3

S3文件数据的本地存放目录

35

fs.s3.maxRetries

4

S3文件数据的偿试读写次数

36

fs.s3.sleepTimeSeconds

10

S3文件偿试的间隔

37

local.cache.size

10737418240

缓存大小设置为10GB

38

io.seqfile.compress.blocksize

1000000

压缩流式文件中的最小块数为100万

39

io.seqfile.lazydecompress

true

块是否需要压缩标识设定

40

io.seqfile.sorter.recordlimit

1000000

内存中排序记录块类最小为100万

41

io.mapfile.bloom.size

1048576

BloomMapFiler过滤量为1M

42

io.mapfile.bloom.error.rate

0.005

43

hadoop.util.hash.type

murmur

缺少hash方法为murmur

44

ipc.client.idlethreshold

4000

连接数据最小阀值为4000

45

ipc.client.kill.max

10

一个客户端连接数最大值为10

46

ipc.client.connection.maxidletime

10000

断开与服务器连接的时间最大为10秒

47

ipc.client.connect.max.retries

10

建立与服务器连接的重试次数为10次

48

ipc.server.listen.queue.size

128

接收客户连接的监听队例的长度为128

49

ipc.server.tcpnodelay

false

开启或关闭服务器端TCP连接算法

50

ipc.client.tcpnodelay

false

开启或关闭客户端TCP连接算法

51

webinterface.private.actions

false

Web交互的行为设定

52

hadoop.rpc.socket.factory.class.default

org.apache.hadoop.net.StandardSocketFactory

缺省的socket工厂类设置

53

hadoop.rpc.socket.factory.class.ClientProtocol

与dfs连接时的缺省socket工厂类

54

hadoop.socks.server

服务端的工厂类缺省设置为SocksSocketFactory.

55

topology.node.switch.mapping.impl

org.apache.hadoop.net.ScriptBasedMapping

56

topology.script.file.name

57

topology.script.number.args

100

参数数量最多为100

58

hadoop.security.uid.cache.secs

14400

3.2hdfs-default.html

序号

参数名

参数值

参数说明

1

dfs.namenode.logging.level

info

Thelogginglevelfordfsnamenode.Othervaluesare"dir"(tracenamespacemutations),"block"(traceblockunder/overreplicationsandblockcreations/deletions),or"all".

2

dfs.secondary.http.address

0.0.0.0:50090

Thesecondarynamenodehttpserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.

3

dfs.datanode.address

0.0.0.0:50010

Theaddresswherethedatanodeserverwilllistento.Iftheportis0thentheserverwillstartonafreeport.

4

dfs.datanode.http.address

0.0.0.0:50075

Thedatanodehttpserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.

5

dfs.datanode.ipc.address

0.0.0.0:50020

Thedatanodeipcserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.

6

dfs.datanode.handler.count

3

Thenumberofserverthreadsforthedatanode.

7

dfs.http.address

0.0.0.0:50070

Theaddressandthebaseportwherethedfsnamenodewebuiwilllistenon.Iftheportis0thentheserverwillstartonafreeport.

8

dfs.https.enable

false

DecideifHTTPS(SSL)issupportedonHDFS

9

dfs.https.need.client.auth

false

WhetherSSLclientcertificateauthenticationisrequired

10

dfs.https.server.keystore.resource

ssl-server.xml

Resourcefilefromwhichsslserverkeystoreinformationwillbeextracted

11

dfs.https.client.keystore.resource

ssl-client.xml

Resourcefilefromwhichsslclientkeystoreinformationwillbeextracted

12

dfs.datanode.https.address

0.0.0.0:50475

13

dfs.https.address

0.0.0.0:50470

14

dfs.datanode.dns.interface

default

ThenameoftheNetworkInterfacefromwhichadatanodeshouldreportitsIPaddress.

15

dfs.datanode.dns.nameserver

default

ThehostnameorIPaddressofthenameserver(DNS)whichaDataNodeshouldusetodeterminethehostnameusedbytheNameNodeforcommunicationanddisplaypurposes.

16

dfs.replication.considerLoad

true

DecideifchooseTargetconsidersthetarget'sloadornot

17

dfs.default.chunk.view.size

32768

Thenumberofbytestoviewforafileonthebrowser.

18

dfs.datanode.du.reserved

Reservedspaceinbytespervolume.Alwaysleavethismuchspacefreefornondfsuse.

19

dfs.name.dir

${hadoop.tmp.dir}/dfs/name

DetermineswhereonthelocalfilesystemtheDFSnamenodeshouldstorethenametable(fsimage).Ifthisisacomma-delimitedlistofdirectoriesthenthenametableisreplicatedinallofthedirectories,forredundancy.

20

dfs.name.edits.dir

${dfs.name.dir}

DetermineswhereonthelocalfilesystemtheDFSnamenodeshouldstorethetransaction(edits)file.Ifthisisacomma-delimitedlistofdirectoriesthenthetransactionfileisreplicatedinallofthedirectories,forredundancy.Defaultvalueissameasdfs.name.dir

21

dfs.web.ugi

webuser,webgroup

Theuseraccountusedbythewebinterface.Syntax:USERNAME,GROUP1,GROUP2,...

22

dfs.permissions

true

If"true",enablepermissioncheckinginHDFS.If"false",permissioncheckingisturnedoff,butallotherbehaviorisunchanged.Switchingfromoneparametervaluetotheotherdoesnotchangethemode,ownerorgroupoffilesordirectories.

23

dfs.permissions.supergroup

supergroup

Thenameofthegroupofsuper-users.

24

dfs.block.access.token.enable

false

If"true",accesstokensareusedascapabilitiesforaccessingdatanodes.If"false",noaccesstokensarecheckedonaccessingdatanodes.

25

dfs.block.access.key.update.interval

600

Intervalinminutesatwhichnamenodeupdatesitsaccesskeys.

26

dfs.block.access.token.lifetime

600

Thelifetimeofaccesstokensinminutes.

27

dfs.data.dir

${hadoop.tmp.dir}/dfs/data

DetermineswhereonthelocalfilesystemanDFSdatanodeshouldstoreitsblocks.Ifthisisacomma-delimitedlistofdirectories,thendatawillbestoredinallnameddirectories,typicallyondifferentdevices.Directoriesthatdonotexistareignored.

28

dfs.datanode.data.dir.perm

755

PermissionsforthedirectoriesononthelocalfilesystemwheretheDFSdatanodestoreitsblocks.Thepermissionscaneitherbeoctalorsymbolic.

29

dfs.replication

3

Defaultblockreplication.Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.Thedefaultisusedifreplicationisnotspecifiedincreatetime.

30

dfs.replication.max

512

Maximalblockreplication.

31

dfs.replication.min

1

Minimalblockreplication.

32

dfs.block.size

67108864

Thedefaultblocksizefornewfiles.

33

dfs.df.interval

60000

Diskusagestatisticsrefreshintervalinmsec.

34

dfs.client.block.write.retries

3

Thenumberofretriesforwritingblockstothedatanodes,beforewesignalfailuretotheapplication.

35

dfs.blockreport.intervalMsec

3600000

Determinesblockreportingintervalinmilliseconds.

36

dfs.blockreport.initialDelay

Delayforfirstblockreportinseconds.

37

dfs.heartbeat.interval

3

Determinesdatanodeheartbeatintervalinseconds.

38

dfs.namenode.handler.count

10

Thenumberofserverthreadsforthenamenode.

39

dfs.safemode.threshold.pct

0.999f

Specifiesthepercentageofblocksthatshouldsatisfytheminimalreplicationrequirementdefinedbydfs.replication.min.Valueslessthanorequalto0meannottostartinsafemode.Valuesgreaterthan1willmakesafemodepermanent.

Determinesextensionofsafemodeinmillisecondsafterthethresholdlevelisreached.

40

dfs.safemode.extension

30000

Determinesextensionofsafemodeinmillisecondsafterthethresholdlevelisreached.

41

dfs.balance.bandwidthPerSec

1048576

Specifiesthemaximumamountofbandwidththateachdatanodecanutilizeforthebalancingpurposeintermofthenumberofbytespersecond.

42

dfs.hosts

Namesafilethatcontainsalistofhoststhatarepermittedtoconnecttothenamenode.Thefullpathnameofthefilemustbespecified.Ifthevalueisempty,allhostsarepermitted.

43

dfs.hosts.exclude

Namesafilethatcontainsalistofhoststhatarenotpermittedtoconnecttothenamenode.Thefullpathnameofthefilemustbespecified.Ifthevalueisempty,nohostsareexcluded.

44

dfs.max.objects

0

Themaximumnumberoffiles,directoriesandblocksdfssupports.Avalueofzeroindicatesnolimittothenumberofobjectsthatdfssupports.

45

dfs.namenode.decommission.interval

30

Namenodeperiodicityinsecondstocheckifdecommissioniscomplete.

46

dfs.namenode.decommission.nodes.per.interval

5

Thenumberofnodesnamenodechecksifdecommissioniscompleteineachdfs.namenode.decommission.interval.

47

dfs.replication.interval

3

Theperiodicityinsecondswithwhichthenamenodecomputesrepliactionworkfordatanodes.

48

dfs.access.time.precision

3600000

TheaccesstimeforHDFSfileispreciseuptothisvalue.Thedefaultvalueis1hour.Settingavalueof0disablesaccesstimesforHDFS.

49

dfs.support.append

false

DoesHDFSallowappendstofiles?Thisiscurrentlysettofalsebecausetherearebugsinthe"appendcode"andisnotsupportedinanyprodctioncluster.

50

dfs.namenode.delegation.key.update-interval

86400000

Theupdateintervalformasterkeyfordelegationtokensinthenamenodeinmilliseconds.

51

dfs.namenode.delegation.token.max-lifetime

604800000

Themaximumlifetimeinmillisecondsforwhichadelegationtokenisvalid.

52

dfs.namenode.delegation.token.renew-interval

86400000

Therenewalintervalfordelegationtokeninmilliseconds.

53

dfs.datanode.failed.volumes.tolerated

Thenumberofvolumesthatareallowedtofailbeforeadatanodestopsofferingservice.Bydefaultanyvolumefailurewillcauseadatanodetoshutdown.

3.3mapred-default.html

序号

参数名

参数值

参数说明

1

hadoop.job.history.location

Ifjobtrackerisstaticthehistoryfilesarestoredinthissinglewellknownplace.IfNovalueissethere,bydefault,itisinthelocalfilesystemat${hadoop.log.dir}/history.

2

hadoop.job.history.user.location

Usercanspecifyalocationtostorethehistoryfilesofaparticularjob.Ifnothingisspecified,thelogsarestoredinoutputdirectory.Thefilesarestoredin"_logs/history/"inthedirectory.Usercanstoploggingbygivingthevalue"none".

3

mapred.job.tracker.history.completed.location

Thecompletedjobhistoryfilesarestoredatthissinglewellknownlocation.Ifnothingisspecified,thefilesarestoredat${hadoop.job.history.location}/done.

4

io.sort.factor

10

Thenumberofstreamstomergeatoncewhilesortingfiles.Thisdeterminesthenumberofopenfilehandles.

5

io.sort.mb

100

Thetotalamountofbuffermemorytousewhilesortingfiles,inmegabytes.Bydefault,giveseachmergestream1MB,whichshouldminimizeseeks.

6

io.sort.record.percent

0.05

Thepercentageofio.sort.mbdedicatedtotrackingrecordboundaries.Letthisvalueber,io.sort.mbbex.Themaximumnumberofrecordscollectedbeforethecollectionthreadmustblockisequalto(r*x)/4

7

io.sort.spill.percent

0.80

Thesoftlimitineitherthebufferorrecordcollectionbuffers.Oncereached,athreadwillbegintospillthecontentstodiskinthebackground.Notethatthisdoesnotimplyanychunkingofdatatothespill.Avaluelessthan0.5isnotrecommended.

8

io.map.index.skip

Numberofindexentriestoskipbetweeneachentry.Zerobydefault.Settingthistovalueslargerthanzerocanfacilitateopeninglargemapfilesusinglessmemory.

9

mapred.job.tracker

local

ThehostandportthattheMapReducejobtrackerrunsat.If"local",thenjobsarerunin-processasasinglemapandreducetask.

10

mapred.job.tracker.http.address

0.0.0.0:50030

Thejobtrackerhttpserveraddressandporttheserverwilllistenon.Iftheportis0thentheserverwillstartonafreeport.

11

mapred.job.tracker.handler.count

10

ThenumberofserverthreadsfortheJobTracker.Thisshouldberoughly4%ofthenumberoftasktrackernodes.

12

mapred.task.tracker.report.address

127.0.0.1:0

Theinterfaceandportthattasktrackerserverlistenson.Sinceitisonlyconnectedtobythetasks,itusesthelocalinterface.EXPERTONLY.Shouldonlybechangedifyourhostdoesnothavetheloopbackinterface.

13

mapred.local.dir

${hadoop.tmp.dir}/mapred/local

ThelocaldirectorywhereMapReducestoresintermediatedatafiles.Maybeacomma-separatedlistofdirectoriesondifferentdevicesinordertospreaddiski/o.Directoriesthatdonotexistareignored.

14

mapred.system.dir

${hadoop.tmp.dir}/mapred/system

ThedirectorywhereMapReducestorescontrolfiles.

15

mapreduce.jobtracker.staging.root.dir

${hadoop.tmp.dir}/mapred/staging

Therootofthestagingareaforusers'jobfilesInpractice,thisshouldbethedirectorywhereusers'homedirectoriesarelocated(usually/user)

16

mapred.temp.dir

${hadoop.tmp.dir}/mapred/temp

Ashareddirectoryfortemporaryfiles.

17

mapred.local.dir.minspacestart

Ifthespaceinmapred.local.dirdropsunderthis,donotaskformoretasks.Valueinbytes.

18

mapred.local.dir.minspacekill

Ifthespaceinmapred.local.dirdropsunderthis,donotaskmoretasksuntilallthecurrentoneshavefinishedandcleanedup.Also,tosavetherestofthetaskswehaverunning,killoneofthem,tocleanupsomespace.Startwiththereducetasks,thengowiththeonesthathavefinishedtheleast.Valueinbytes.

19

mapred.tasktracker.expiry.interval

600000

Expert:Thetime-interval,inmiliseconds,afterwhichatasktrackerisdeclared'lost'ifitdoesn'tsendheartbeats.

20

mapred.tasktracker.resourcecalculatorplugin

Nameoftheclasswhoseinstancewillbeusedtoqueryresourceinformationonthetasktracker.Theclassmustbeaninstanceoforg.apache.hadoop.util.ResourceCalculatorPlugin.Ifthevalueisnull,thetasktrackerattemptstouseaclassappropriatetotheplatform.Currently,theonlyplatformsupportedisLinux.

21

mapred.tasktracker.taskmemorymanager.monitoring-interval

5000

Theinterval,inmilliseconds,forwhichthetasktrackerwaitsbetweentwocyclesofmonitoringitstasks'memoryusage.Usedonlyiftasks'memorymanagementisenabledviamapred.tasktracker.tasks.maxmemory.

22

mapred.tasktracker.tasks.sleeptime-before-sigkill

5000

Thetime,inmilliseconds,thetasktrackerwaitsforsendingaSIGKILLtoaprocess,afterithasbeensentaSIGTERM.

23

mapred.map.tasks

2

Thedefaultnumberofmaptasksperjob.Ignoredwhenmapred.job.trackeris"local".

24

mapred.reduce.tasks

1

Thedefaultnumberofreducetasksperjob.Typicallysetto99%ofthecluster'sreducecapacity,sothatifanodefailsthereducescanstillbeexecutedinasinglewave.Ignoredwhenmapred.job.trackeris"local".

25

mapreduce.tasktracker.outofband.heartbeat

false

Expert:Setthistotruetoletthetasktrackersendanout-of-bandheartbeatontask-completionforbetterlatency.

26

mapreduce.tasktracker.outofband.heartbeat.damper

1000000

Whenout-of-bandheartbeatsareenabled,providesdampingtoavoidoverwhelmingtheJobTrackeriftoomanyout-of-bandheartbeatswouldoccur.Thedampingiscalculatedsuchthattheheartbeatintervalisdividedby(T*D+1)whereTisthenumberofcompletedtasksandDisthedampervalue.Settingthistoahighvaluelikethedefaultprovidesnodamping--assoonasanytaskfinishes,aheartbeatwillbesent.Settingthisparameterto0isequivalenttodisablingtheout-of-bandheartbeatfeature.Avalueof1wouldindicatethat,afteronetaskhascompleted,thetimetowaitbeforethenextheartbeatwouldbe1/2theusualtime.Aftertwotaskshavefinished,itwouldbe1/3theusualtime,etc.

27

mapred.jobtracker.restart.recover

false

"true"toenable(job)recoveryuponrestart,"false"tostartafresh

28

mapred.jobtracker.job.history.block.size

3145728

Theblocksizeofthejobhistoryfile.Sincethejobrecoveryusesjobhistory,itsimportanttodumpjobhistorytodiskassoonaspossible.Notethatthisisanexpertlevelparameter.Thedefaultvalueissetto3MB.

29

mapreduce.job.split.metainfo.maxsize

10000000

Themaximumpermissiblesizeofthesplitmetainfofile.TheJobTrackerwon'tattempttoreadsplitmetainfofilesbiggerthantheconfiguredvalue.Nolimitsifsetto-1.

30

mapred.jobtracker.taskScheduler

org.apache.hadoop.mapred.JobQueueTaskScheduler

Theclassresponsibleforschedulingthetasks.

31

mapred.jobtracker.taskScheduler.maxRunningTasksPerJob

Themaximumnumberofrunningtasksforajobbeforeitgetspreempted.Nolimitsifundefined.

32

mapred.map.max.attempts

4

Expert:Themaximumnumberofattemptspermaptask.Inotherwords,frameworkwilltrytoexecuteamaptaskthesemanynumberoftimesbeforegivinguponit.

33

mapred.reduce.max.attempts

4

Expert:Themaximumnumberofattemptsperreducetask.Inotherwords,frameworkwilltrytoexecuteareducetaskthesemanynumberoftimesbeforegivinguponit.

34

mapred.reduce.parallel.copies

5

Thedefaultnumberofparalleltransfersrunbyreduceduringthecopy(shuffle)phase.

35

mapreduce.reduce.shuffle.maxfetchfailures

10

Themaximumnumberoftimesareducertriestofetchamapoutputbeforeitreportsit.

36

mapreduce.reduce.shuffle.connect.timeout

180000

Expert:Themaximumamountoftime(inmilliseconds)areducetaskspendsintryingtoconnecttoatasktrackerforgettingmapoutput.

37

mapreduce.reduce.shuffle.read.timeout

180000

Expert:Themaximumamountoftime(inmilliseconds)areducetaskwaitsformapoutputdatatobeavailableforreadingafterobtainingconnection.

38

mapred.task.timeout

600000

Thenumberofmillisecondsbeforeataskwillbeterminatedifitneitherreadsaninput,writesanoutput,norupdatesitsstatusstring.

39

mapred.tasktracker.map.tasks.maximum

2

Themaximumnumberofmaptasksthatwillberunsimultaneouslybyatasktracker.

40

mapred.tasktracker.reduce.tasks.maximum

2

Themaximumnumberofreducetasksthatwillberunsimultaneouslybyatasktracker.

41

mapred.jobtracker.completeuserjobs.maximum

100

Themaximumnumberofcompletejobsperusertokeeparoundbeforedelegatingthemtothejobhistory.

42

mapreduce.reduce.input.limit

-1

Thelimitontheinputsizeofthereduce.Iftheestimatedinputsizeofthereduceisgreaterthanthisvalue,jobisfailed.Avalueof-1meansthatthereisnolimitset.

43

mapred.job.tracker.retiredjobs.cache.size

1000

Thenumberofretiredjobstatustokeepinthecache.

44

mapred.job.tracker.jobhistory.lru.cache.size

5

Thenumberofjobhistoryfilesloadedinmemory.Thejobsareloadedwhentheyarefirstaccessed.ThecacheisclearedbasedonLRU.

45

mapred.child.java.opts

-Xmx200m

Javaoptsforthetasktrackerchildprocesses.Thefollowingsymbol,ifpresent,willbeinterpolated:@[email protected]'@'willgounchanged.Forexample,toenableverbosegcloggingtoafilenamedforthetaskidin/tmpandtosettheheapmaximumtobeagigabyte,passa'value'of:-Xmx1024m-verbose:gc-Xloggc:/tmp/@taskid@.gcTheconfigurationvariablemapred.child.ulimitcanbeusedtocontrolthemaximumvirtualmemoryofthechildprocesses.

46

mapred.child.env

Useraddedenvironmentvariablesforthetasktrackerchildprocesses.Example:1)A=fooThiswillsettheenvvariableAtofoo2)B=$B:cThisisinherittasktracker'sBenvvariable.

47

mapred.child.ulimit

Themaximumvirtualmemory,inKB,ofaprocesslaunchedbytheMap-Reduceframework.ThiscanbeusedtocontrolboththeMapper/ReducertasksandapplicationsusingHadoopPipes,HadoopStreamingetc.Bydefaultitisleftunspecifiedtoletclusteradminscontrolitvialimits.confandothersuchrelevantmechanisms.Note:mapred.child.ulimitmustbegreaterthanorequaltothe-XmxpassedtoJavaVM,elsetheVMmightnotstart.

48

mapred.cluster.map.memory.mb

-1

Thesize,intermsofvirtualmemory,ofasinglemapslotintheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglemaptaskviamapred.job.map.memory.mb,uptothelimitspecifiedbymapred.cluster.max.map.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.

49

mapred.cluster.reduce.memory.mb

-1

Thesize,intermsofvirtualmemory,ofasinglereduceslotintheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglereducetaskviamapred.job.reduce.memory.mb,uptothelimitspecifiedbymapred.cluster.max.reduce.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.

50

mapred.cluster.max.map.memory.mb

-1

Themaximumsize,intermsofvirtualmemory,ofasinglemaptasklaunchedbytheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglemaptaskviamapred.job.map.memory.mb,uptothelimitspecifiedbymapred.cluster.max.map.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.

51

mapred.cluster.max.reduce.memory.mb

-1

Themaximumsize,intermsofvirtualmemory,ofasinglereducetasklaunchedbytheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglereducetaskviamapred.job.reduce.memory.mb,uptothelimitspecifiedbymapred.cluster.max.reduce.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.

52

mapred.job.map.memory.mb

-1

Thesize,intermsofvirtualmemory,ofasinglemaptaskforthejob.Ajobcanaskformultipleslotsforasinglemaptask,roundeduptothenextmultipleofmapred.cluster.map.memory.mbanduptothelimitspecifiedbymapred.cluster.max.map.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoffiffmapred.cluster.map.memory.mbisalsoturnedoff(-1).

53

mapred.job.reduce.memory.mb

-1

Thesize,intermsofvirtualmemory,ofasinglereducetaskforthejob.Ajobcanaskformultipleslotsforasinglemaptask,roundeduptothenextmultipleofmapred.cluster.reduce.memory.mbanduptothelimitspecifiedbymapred.cluster.max.reduce.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoffiffmapred.cluster.reduce.memory.mbisalsoturnedoff(-1).

54

mapred.child.tmp

/tmp

Tosetthevalueoftmpdirectoryformapandreducetasks.Ifthevalueisanabsolutepath,itisdirectlyassigned.Otherwise,itisprependedwithtask'sworkingdirectory.Thejavatasksareexecutedwithoption-Djava.io.tmpdir='theabsolutepathofthetmpdir'.Pipesandstreamingaresetwithenvironmentvariable,TMPDIR='theabsolutepathofthetmpdir'

55

mapred.inmem.merge.threshold

1000

Thethreshold,intermsofthenumberoffilesforthein-memorymergeprocess.Whenweaccumulatethresholdnumberoffilesweinitiatethein-memorymergeandspilltodisk.Avalueof0orlessthan0indicateswewanttoDON'Thaveanythresholdandinsteaddependonlyontheramfs'smemoryconsumptiontotriggerthemerge.

56

mapred.job.shuffle.merge.percent

0.66

Theusagethresholdatwhichanin-memorymergewillbeinitiated,expressedasapercentageofthetotalmemoryallocatedtostoringin-memorymapoutputs,asdefinedbymapred.job.shuffle.input.buffer.percent.

57

mapred.job.shuffle.input.buffer.percent

0.70

Thepercentageofmemorytobeallocatedfromthemaximumheapsizetostoringmapoutputsduringtheshuffle.

58

mapred.job.reduce.input.buffer.percent

0.0

Thepercentageofmemory-relativetothemaximumheapsize-toretainmapoutputsduringthereduce.Whentheshuffleisconcluded,anyremainingmapoutputsinmemorymustconsumelessthanthisthresholdbeforethereducecanbegin.

59

mapred.map.tasks.speculative.execution

true

Iftrue,thenmultipleinstancesofsomemaptasksmaybeexecutedinparallel.

60

mapred.reduce.tasks.speculative.execution

true

Iftrue,thenmultipleinstancesofsomereducetasksmaybeexecutedinparallel.

61

mapred.job.reuse.jvm.num.tasks

1

Howmanytaskstorunperjvm.Ifsetto-1,thereisnolimit.

62

mapred.min.split.size

Theminimumsizechunkthatmapinputshouldbesplitinto.Notethatsomefileformatsmayhaveminimumsplitsizesthattakepriorityoverthissetting.

63

mapred.jobtracker.maxtasks.per.job

-1

Themaximumnumberoftasksforasinglejob.Avalueof-1indicatesthatthereisnomaximum.

64

mapred.submit.replication

10

Thereplicationlevelforsubmittedjobfiles.Thisshouldbearoundthesquarerootofthenumberofnodes.

65

mapred.tasktracker.dns.interface

default

ThenameoftheNetworkInterfacefromwhichatasktrackershouldreportitsIPaddress.

66

mapred.tasktracker.dns.nameserver

default

ThehostnameorIPaddressofthenameserver(DNS)whichaTaskTrackershouldusetodeterminethehostnameusedbytheJobTrackerforcommunicationanddisplaypurposes.

67

tasktracker.http.threads

40

Thenumberofworkerthreadsthatforthehttpserver.Thisisusedformapoutputfetching

68

mapred.task.tracker.http.address

0.0.0.0:50060

Thetasktrackerhttpserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.

69

keep.failed.task.files

false

Shouldthefilesforfailedtasksbekept.Thisshouldonlybeusedonjobsthatarefailing,becausethestorageisneverreclaimed.Italsopreventsthemapoutputsfrombeingerasedfromthereducedirectoryastheyareconsumed.

70

mapred.output.compress

false

Shouldthejoboutputsbecompressed?

71

mapred.output.compression.type

RECORD

IfthejoboutputsaretocompressedasSequenceFiles,howshouldtheybecompressed?ShouldbeoneofNONE,RECORDorBLOCK.

72

mapred.output.compression.codec

org.apache.hadoop.io.compress.DefaultCodec

Ifthejoboutputsarecompressed,howshouldtheybecompressed?

73

mapred.compress.map.output

false

Shouldtheoutputsofthemapsbecompressedbeforebeingsentacrossthenetwork.UsesSequenceFilecompression.

74

mapred.map.output.compression.codec

org.apache.hadoop.io.compress.DefaultCodec

Ifthemapoutputsarecompressed,howshouldtheybecompressed?

75

map.sort.class

org.apache.hadoop.util.QuickSort

Thedefaultsortclassforsortingkeys.

76

mapred.userlog.limit.kb

Themaximumsizeofuser-logsofeachtaskinKB.0disablesthecap.

77

mapred.userlog.retain.hours

24

Themaximumtime,inhours,forwhichtheuser-logsaretoberetainedafterthejobcompletion.

78

mapred.user.jobconf.limit

5242880

Themaximumallowedsizeoftheuserjobconf.Thedefaultissetto5MB

79

mapred.hosts

Namesafilethatcontainsthelistofnodesthatmayconnecttothejobtracker.Ifthevalueisempty,allhostsarepermitted.

80

mapred.hosts.exclude

Namesafilethatcontainsthelistofhoststhatshouldbeexcludedbythejobtracker.Ifthevalueisempty,nohostsareexcluded.

81

mapred.heartbeats.in.second

100

Expert:Approximatenumberofheart-beatsthatcouldarriveatJobTrackerinasecond.AssumingeachRPCcanbeprocessedin10msec,thedefaultvalueismade100RPCsinasecond.

82

mapred.max.tracker.blacklists

4

Thenumberofblacklistsforatasktrackerbyvariousjobsafterwhichthetasktrackerwillbemarkedaspotentiallyfaultyandisacandidateforgraylistingacrossalljobs.(Unlikeblacklisting,thisisadvisory;thetrackerremainsactive.However,itisreportedasgraylistedinthewebUI,withtheexpectationthatchronicallygraylistedtrackerswillbemanuallydecommissioned.)Thisvalueistiedtomapred.jobtracker.blacklist.fault-timeout-window;faultsolderthanthewindowwidthareforgiven,sothetrackerwillrecoverfromtransientproblems.Itwillalsobecomehealthyafterarestart.

83

mapred.jobtracker.blacklist.fault-timeout-window

180

Thetimeout(inminutes)afterwhichper-jobtasktrackerfaultsareforgiven.Thewindowislogicallyacircularbufferoftime-intervalbucketswhosewidthisdefinedbymapred.jobtracker.blacklist.fault-bucket-width;whenthe"now"pointermovesacrossabucketboundary,thepreviouscontents(faults)ofthenewbucketarecleared.Inotherwords,thetimeout'sgranularityisdeterminedbythebucketwidth.

84

mapred.jobtracker.blacklist.fault-bucket-width

15

Thewidth(inminutes)ofeachbucketinthetasktrackerfaulttimeoutwindow.Eachbucketisreusedinacircularmannerafterafulltimeout-windowinterval(definedbymapred.jobtracker.blacklist.fault-timeout-window).

85

mapred.max.tracker.failures

4

Thenumberoftask-failuresonatasktrackerofagivenjobafterwhichnewtasksofthatjobaren'tassignedtoit.

86

jobclient.output.filter

FAILED

Thefilterforcontrollingtheoutputofthetask'suserlogssenttotheconsoleoftheJobClient.Thepermissibleoptionsare:NONE,KILLED,FAILED,SUCCEEDEDandALL.

87

mapred.job.tracker.persist.jobstatus.active

false

Indicatesifpersistencyofjobstatusinformationisactiveornot.

88

mapred.job.tracker.persist.jobstatus.hours

ThenumberofhoursjobstatusinformationispersistedinDFS.Thejobstatusinformationwillbeavailableafteritdropsofthememoryqueueandbetweenjobtrackerrestarts.WithazerovaluethejobstatusinformationisnotpersistedatallinDFS.

89

mapred.job.tracker.persist.jobstatus.dir

/jobtracker/jobsInfo

Thedirectorywherethejobstatusinformationispersistedinafilesystemtobeavailableafteritdropsofthememoryqueueandbetweenjobtrackerrestarts.

90

mapreduce.job.complete.cancel.delegation.tokens

true

iffalse-donotunregister/canceldelegationtokensfromrenewal,becausesametokensmaybeusedbyspawnedjobs

91

mapred.task.profile

false

Tosetwhetherthesystemshouldcollectprofilerinformationforsomeofthetasksinthisjob?Theinformationisstoredintheuserlogdirectory.Thevalueis"true"iftaskprofilingisenabled.

92

mapred.task.profile.maps

0-2

Tosettherangesofmaptaskstoprofile.mapred.task.profilehastobesettotrueforthevaluetobeaccounted.

93

mapred.task.profile.reduces

0-2

Tosettherangesofreducetaskstoprofile.mapred.task.profilehastobesettotrueforthevaluetobeaccounted.

94

mapred.line.input.format.linespermap

1

NumberoflinespersplitinNLineInputFormat.

95

mapred.skip.attempts.to.start.skipping

2

ThenumberofTaskattemptsAFTERwhichskipmodewillbekickedoff.Whenskipmodeiskickedoff,thetasksreportstherangeofrecordswhichitwillprocessnext,totheTaskTracker.Sothatonfailures,TTknowswhichonesarepossiblythebadrecords.Onfurtherexecutions,thoseareskipped.

96

mapred.skip.map.auto.incr.proc.count

true

Theflagwhichifsettotrue,SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDSisincrementedbyMapRunnerafterinvokingthemapfunction.Thisvaluemustbesettofalseforapplicationswhichprocesstherecordsasynchronouslyorbuffertheinputrecords.Forexamplestreaming.Insuchcasesapplicationsshouldincrementthiscounterontheirown.

97

mapred.skip.reduce.auto.incr.proc.count

true

Theflagwhichifsettotrue,SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPSisincrementedbyframeworkafterinvokingthereducefunction.Thisvaluemustbesettofalseforapplicationswhichprocesstherecordsasynchronouslyorbuffertheinputrecords.Forexamplestreaming.Insuchcasesapplicationsshouldincrementthiscounterontheirown.

98

mapred.skip.out.dir

Ifnovalueisspecifiedhere,theskippedrecordsarewrittentotheoutputdirectoryat_logs/skip.Usercanstopwritingskippedrecordsbygivingthevalue"none".

99

mapred.skip.map.max.skip.records

ThenumberofacceptableskiprecordssurroundingthebadrecordPERbadrecordinmapper.Thenumberincludesthebadrecordaswell.Toturnthefeatureofdetection/skippingofbadrecordsoff,setthevalueto0.TheframeworktriestonarrowdowntheskippedrangebyretryinguntilthisthresholdismetORallattemptsgetexhaustedforthistask.SetthevaluetoLong.MAX_VALUEtoindicatethatframeworkneednottrytonarrowdown.Whateverrecords(dependsonapplication)getskippedareacceptable.

100

mapred.skip.reduce.max.skip.groups

ThenumberofacceptableskipgroupssurroundingthebadgroupPERbadgroupinreducer.Thenumberincludesthebadgroupaswell.Toturnthefeatureofdetection/skippingofbadgroupsoff,setthevalueto0.TheframeworktriestonarrowdowntheskippedrangebyretryinguntilthisthresholdismetORallattemptsgetexhaustedforthistask.SetthevaluetoLong.MAX_VALUEtoindicatethatframeworkneednottrytonarrowdown.Whatevergroups(dependsonapplication)getskippedareacceptable.

101

job.end.retry.attempts

IndicateshowmanytimeshadoopshouldattempttocontactthenotificationURL

102

job.end.retry.interval

30000

IndicatestimeinmillisecondsbetweennotificationURLretrycalls

103

hadoop.rpc.socket.factory.class.JobSubmissionProtocol

SocketFactorytousetoconnecttoaMap/Reducemaster(JobTracker).Ifnullorempty,thenusehadoop.rpc.socket.class.default.

104

mapred.task.cache.levels

2

Thisisthemaxlevelofthetaskcache.Forexample,ifthelevelis2,thetaskscachedareatthehostlevelandattheracklevel

105

mapred.queue.names

default

Commaseparatedlistofqueuesconfiguredforthisjobtracker.Jobsareaddedtoqueuesandschedulerscanconfiguredifferentschedulingpropertiesforthevariousqueues.Toconfigureapropertyforaqueue,thenameofthequeuemustmatchthenamespecifiedinthisvalue.Queuepropertiesthatarecommontoallschedulersareconfiguredherewiththenamingconvention,mapred.queue.$QUEUE-NAME.$PROPERTY-NAME,fore.g.mapred.queue.default.submit-job-acl.Thenumberofqueuesconfiguredinthisparametercoulddependonthetypeofschedulerbeingused,asspecifiedinmapred.jobtracker.taskScheduler.Forexample,theJobQueueTaskSchedulersupportsonlyasinglequeue,whichisthedefaultconfiguredhere.Beforeaddingmorequeues,ensurethatthescheduleryou'veconfiguredsupportsmultiplequeues.

106

mapred.acls.enabled

false

SpecifieswhetherACLsshouldbecheckedforauthorizationofusersfordoingvariousqueueandjobleveloperations.ACLsaredisabledbydefault.Ifenabled,accesscontrolchecksaremadebyJobTrackerandTaskTrackerwhenrequestsaremadebyusersforqueueoperationslikesubmitjobtoaqueueandkillajobinthequeueandjoboperationslikeviewingthejob-details(Seemapreduce.job.acl-view-job)orformodifyingthejob(Seemapreduce.job.acl-modify-job)usingMap/ReduceAPIs,RPCsorviatheconsoleandwebuserinterfaces.

107

mapred.queue.default.state

RUNNING

Thisvaluesdefinesthestate,defaultqueueisin.thevaluescanbeeither"STOPPED"or"RUNNING"Thisvaluecanbechangedatruntime.

108

mapred.job.queue.name

default

Queuetowhichajobissubmitted.Thismustmatchoneofthequeuesdefinedinmapred.queue.namesforthesystem.Also,theACLsetupforthequeuemustallowthecurrentusertosubmitajobtothequeue.Beforespecifyingaqueue,ensurethatthesystemisconfiguredwiththequeue,andaccessisallowedforsubmittingjobstothequeue.

109

mapreduce.job.acl-modify-job

Jobspecificaccess-controllistfor'modifying'thejob.ItisonlyusedifauthorizationisenabledinMap/Reducebysettingtheconfigurationpropertymapred.acls.enabledtotrue.Thisspecifiesthelistofusersand/orgroupswhocandomodificationoperationsonthejob.Forspecifyingalistofusersandgroupstheformattouseis"user1,user2group1,group".Ifsetto'*',itallowsallusers/groupstomodifythisjob.Ifsetto''(i.e.space),itallowsnone.Thisconfigurationisusedtoguardallthemodificationswithrespecttothisjobandtakescareofallthefollowingoperations:okillingthisjobokillingataskofthisjob,failingataskofthisjobosettingthepriorityofthisjobEachoftheseoperationsarealsoprotectedbytheper-queuelevelACL"acl-administer-jobs"configuredviamapred-queues.xml.Soacallershouldhavetheauthorizationtosatisfyeitherthequeue-levelACLorthejob-levelACL.IrrespectiveofthisACLconfiguration,job-owner,theuserwhostartedthecluster,clusteradministratorsconfiguredviamapreduce.cluster.administratorsandqueueadministratorsofthequeuetowhichthisjobissubmittedtoconfiguredviamapred.queue.queue-name.acl-administer-jobsinmapred-queue-acls.xmlcandoallthemodificationoperationsonajob.Bydefault,nobodyelsebesidesjob-owner,theuserwhostartedthecluster,clusteradministratorsandqueueadministratorscanperformmodificationoperationsonajob.

110

mapreduce.job.acl-view-job

Jobspecificaccess-controllistfor'viewing'thejob.ItisonlyusedifauthorizationisenabledinMap/Reducebysettingtheconfigurationpropertymapred.acls.enabledtotrue.Thisspecifiesthelistofusersand/orgroupswhocanviewprivatedetailsaboutthejob.Forspecifyingalistofusersandgroupstheformattouseis"user1,user2group1,group".Ifsetto'*',itallowsallusers/groupstomodifythisjob.Ifsetto''(i.e.space),itallowsnone.Thisconfigurationisusedtoguardsomeofthejob-viewsandatpresentonlyprotectsAPIsthatcanreturnpossiblysensitiveinformationofthejob-ownerlikeojob-levelcountersotask-levelcountersotasks'diagnosticinformationotask-logsdisplayedontheTaskTrackerweb-UIandojob.xmlshowedbytheJobTracker'sweb-UIEveryotherpieceofinformationofjobsisstillaccessiblebyanyotheruser,fore.g.,JobStatus,JobProfile,listofjobsinthequeue,etc.IrrespectiveofthisACLconfiguration,job-owner,theuserwhostartedthecluster,clusteradministratorsconfiguredviamapreduce.cluster.administratorsandqueueadministratorsofthequeuetowhichthisjobissubmittedtoconfiguredviamapred.queue.queue-name.acl-administer-jobsinmapred-queue-acls.xmlcandoalltheviewoperationsonajob.Bydefault,nobodyelsebesidesjob-owner,theuserwhostartedthecluster,clusteradministratorsandqueueadministratorscanperformviewoperationsonajob.

111

mapred.tasktracker.indexcache.mb

10

Themaximummemorythatatasktrackerallowsfortheindexcachethatisusedwhenservingmapoutputstoreducers.

112

mapred.combine.recordsBeforeProgress

10000

ThenumberofrecordstoprocessduringcombineoutputcollectionbeforesendingaprogressnotificationtotheTaskTracker.

113

mapred.merge.recordsBeforeProgress

10000

ThenumberofrecordstoprocessduringmergebeforesendingaprogressnotificationtotheTaskTracker.

114

mapred.reduce.slowstart.completed.maps

0.05

Fractionofthenumberofmapsinthejobwhichshouldbecompletebeforereducesarescheduledforthejob.

115

mapred.task.tracker.task-controller

org.apache.hadoop.mapred.DefaultTaskController

TaskControllerwhichisusedtolaunchandmanagetaskexecution

116

mapreduce.tasktracker.group

Expert:GrouptowhichTaskTrackerbelongs.IfLinuxTaskControllerisconfiguredviamapreduce.tasktracker.taskcontroller,thegroupownerofthetask-controllerbinaryshouldbesameasthisgroup.

117

mapred.healthChecker.script.path

Absolutepathtothescriptwhichisperiodicallyrunbythenodehealthmonitoringservicetodetermineifthenodeishealthyornot.Ifthevalueofthiskeyisemptyorthefiledoesnotexistinthelocationconfiguredhere,thenodehealthmonitoringserviceisnotstarted.

118

mapred.healthChecker.interval

60000

Frequencyofthenodehealthscripttoberun,inmilliseconds

119

mapred.healthChecker.script.timeout

600000

Timeafternodehealthscriptshouldbekilledifunresponsiveandconsideredthatthescripthasfailed.

120

mapred.healthChecker.script.args

Listofargumentswhicharetobepassedtonodehealthscriptwhenitisbeinglaunchedcommaseperated.

121

mapreduce.job.counters.limit

120

Limitonthenumberofcountersallowedperjob.

相关推荐