[转发]hadoop 默认参数
转发:http://myext.cn/other/56013.html
1获取默认配置
配置hadoop,主要是配置core-site.xml,hdfs-site.xml,mapred-site.xml三个配置文件,默认下来,这些配置文件都是空的,所以很难知道这些配置文件有哪些配置可以生效,上网找的配置可能因为各个hadoop版本不同,导致无法生效。浏览更多的配置,有两个方法:
1.选择相应版本的hadoop,下载解压后,搜索*.xml,找到core-default.xml,hdfs-default.xml,mapred-default.xml,这些就是默认配置,可以参考这些配置的说明和key,配置hadoop集群。
2.浏览apache官网,三个配置文件链接如下:
http://hadoop.apache.org/common/docs/current/core-default.html
http://hadoop.apache.org/common/docs/current/hdfs-default.html
http://hadoop.apache.org/common/docs/current/mapred-default.html
这里是浏览hadoop当前版本号的默认配置文件,其他版本号,要另外去官网找。其中第一个方法找到默认的配置是最好的,因为每个属性都有说明,可以直接使用。另外,core-site.xml是全局配置,hdfs-site.xml和mapred-site.xml分别是hdfs和mapred的局部配置。
2常用的端口配置
2.1HDFS端口
参数
描述
默认
配置文件
例子值
fs.default.namenamenode
namenodeRPC交互端口
8020
core-site.xml
hdfs://master:8020/
dfs.http.address
NameNodeweb管理端口
50070
hdfs-site.xml
0.0.0.0:50070
dfs.datanode.address
datanode 控制端口
50010
hdfs-site.xml
0.0.0.0:50010
dfs.datanode.ipc.address
datanode的RPC服务器地址和端口
50020
hdfs-site.xml
0.0.0.0:50020
dfs.datanode.http.address
datanode的HTTP服务器和端口
50075
hdfs-site.xml
0.0.0.0:50075
2.2MR端口
参数
描述
默认
配置文件
例子值
mapred.job.tracker
job-tracker交互端口
8021
mapred-site.xml
hdfs://master:8021/
job
tracker的web管理端口
50030
mapred-site.xml
0.0.0.0:50030
mapred.task.tracker.http.address
task-tracker的HTTP端口
50060
mapred-site.xml
0.0.0.0:50060
2.3其它端口
参数
描述
默认
配置文件
例子值
dfs.secondary.http.address
secondaryNameNodeweb管理端口
50090
hdfs-site.xml
0.0.0.0:50090
3三个缺省配置参考文件说明
3.1core-default.html
序号
参数名
参数值
参数说明
1
hadoop.tmp.dir
/tmp/hadoop-${user.name}
临时目录设定
2
hadoop.native.lib
true
使用本地hadoop库标识。
3
hadoop.http.filter.initializers
http服务器过滤链设置
4
hadoop.security.group.mapping
org.apache.hadoop.security.ShellBasedUnixGroupsMapping
组内用户的列表的类设定
5
hadoop.security.authorization
false
服务端认证开启
6
hadoop.security.authentication
simple
无认证或认证设置
7
hadoop.security.token.service.use_ip
true
是否开启使用IP地址作为连接的开关
8
hadoop.logfile.size
10000000
日志文件最大为10M
9
hadoop.logfile.count
10
日志文件数量为10个
10
io.file.buffer.size
4096
流文件的缓冲区为4K
11
io.bytes.per.checksum
512
校验位数为512字节
12
io.skip.checksum.errors
false
校验出错后是抛出异常还是略过标识。True则略过。
13
io.compression.codecs
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
压缩和解压的方式设置
14
io.serializations
org.apache.hadoop.io.serializer.WritableSerialization
序例化和反序列化的类设定
15
fs.default.name
file:///
缺省的文件URI标识设定。
16
fs.trash.interval
文件废弃标识设定,0为禁止此功能
17
fs.file.impl
org.apache.hadoop.fs.LocalFileSystem
本地文件操作类设置
18
fs.hdfs.impl
org.apache.hadoop.hdfs.DistributedFileSystem
HDFS文件操作类设置
19
fs.s3.impl
org.apache.hadoop.fs.s3.S3FileSystem
S3文件操作类设置
20
fs.s3n.impl
org.apache.hadoop.fs.s3native.NativeS3FileSystem
S3文件本地操作类设置
21
fs.kfs.impl
org.apache.hadoop.fs.kfs.KosmosFileSystem
KFS文件操作类设置.
22
fs.hftp.impl
org.apache.hadoop.hdfs.HftpFileSystem
HTTP方式操作文件设置
23
fs.hsftp.impl
org.apache.hadoop.hdfs.HsftpFileSystem
HTTPS方式操作文件设置
24
fs.webhdfs.impl
org.apache.hadoop.hdfs.web.WebHdfsFileSystem
WEB方式操作文件类设置
25
fs.ftp.impl
org.apache.hadoop.fs.ftp.FTPFileSystem
FTP文件操作类设置
26
fs.ramfs.impl
org.apache.hadoop.fs.InMemoryFileSystem
内存文件操作类设置
27
fs.har.impl
org.apache.hadoop.fs.HarFileSystem
压缩文件操作类设置.
28
fs.har.impl.disable.cache
true
是否缓存har文件的标识设定
29
fs.checkpoint.dir
${hadoop.tmp.dir}/dfs/namesecondary
备份名称节点的存放目前录设置
30
fs.checkpoint.edits.dir
${fs.checkpoint.dir}
备份名称节点日志文件的存放目前录设置
31
fs.checkpoint.period
3600
动态检查的间隔时间设置
32
fs.checkpoint.size
67108864
日志文件大小为64M
33
fs.s3.block.size
67108864
写S3文件系统的块的大小为64M
34
fs.s3.buffer.dir
${hadoop.tmp.dir}/s3
S3文件数据的本地存放目录
35
fs.s3.maxRetries
4
S3文件数据的偿试读写次数
36
fs.s3.sleepTimeSeconds
10
S3文件偿试的间隔
37
local.cache.size
10737418240
缓存大小设置为10GB
38
io.seqfile.compress.blocksize
1000000
压缩流式文件中的最小块数为100万
39
io.seqfile.lazydecompress
true
块是否需要压缩标识设定
40
io.seqfile.sorter.recordlimit
1000000
内存中排序记录块类最小为100万
41
io.mapfile.bloom.size
1048576
BloomMapFiler过滤量为1M
42
io.mapfile.bloom.error.rate
0.005
43
hadoop.util.hash.type
murmur
缺少hash方法为murmur
44
ipc.client.idlethreshold
4000
连接数据最小阀值为4000
45
ipc.client.kill.max
10
一个客户端连接数最大值为10
46
ipc.client.connection.maxidletime
10000
断开与服务器连接的时间最大为10秒
47
ipc.client.connect.max.retries
10
建立与服务器连接的重试次数为10次
48
ipc.server.listen.queue.size
128
接收客户连接的监听队例的长度为128
49
ipc.server.tcpnodelay
false
开启或关闭服务器端TCP连接算法
50
ipc.client.tcpnodelay
false
开启或关闭客户端TCP连接算法
51
webinterface.private.actions
false
Web交互的行为设定
52
hadoop.rpc.socket.factory.class.default
org.apache.hadoop.net.StandardSocketFactory
缺省的socket工厂类设置
53
hadoop.rpc.socket.factory.class.ClientProtocol
与dfs连接时的缺省socket工厂类
54
hadoop.socks.server
服务端的工厂类缺省设置为SocksSocketFactory.
55
topology.node.switch.mapping.impl
org.apache.hadoop.net.ScriptBasedMapping
56
topology.script.file.name
57
topology.script.number.args
100
参数数量最多为100
58
hadoop.security.uid.cache.secs
14400
3.2hdfs-default.html
序号
参数名
参数值
参数说明
1
dfs.namenode.logging.level
info
Thelogginglevelfordfsnamenode.Othervaluesare"dir"(tracenamespacemutations),"block"(traceblockunder/overreplicationsandblockcreations/deletions),or"all".
2
dfs.secondary.http.address
0.0.0.0:50090
Thesecondarynamenodehttpserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.
3
dfs.datanode.address
0.0.0.0:50010
Theaddresswherethedatanodeserverwilllistento.Iftheportis0thentheserverwillstartonafreeport.
4
dfs.datanode.http.address
0.0.0.0:50075
Thedatanodehttpserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.
5
dfs.datanode.ipc.address
0.0.0.0:50020
Thedatanodeipcserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.
6
dfs.datanode.handler.count
3
Thenumberofserverthreadsforthedatanode.
7
dfs.http.address
0.0.0.0:50070
Theaddressandthebaseportwherethedfsnamenodewebuiwilllistenon.Iftheportis0thentheserverwillstartonafreeport.
8
dfs.https.enable
false
DecideifHTTPS(SSL)issupportedonHDFS
9
dfs.https.need.client.auth
false
WhetherSSLclientcertificateauthenticationisrequired
10
dfs.https.server.keystore.resource
ssl-server.xml
Resourcefilefromwhichsslserverkeystoreinformationwillbeextracted
11
dfs.https.client.keystore.resource
ssl-client.xml
Resourcefilefromwhichsslclientkeystoreinformationwillbeextracted
12
dfs.datanode.https.address
0.0.0.0:50475
13
dfs.https.address
0.0.0.0:50470
14
dfs.datanode.dns.interface
default
ThenameoftheNetworkInterfacefromwhichadatanodeshouldreportitsIPaddress.
15
dfs.datanode.dns.nameserver
default
ThehostnameorIPaddressofthenameserver(DNS)whichaDataNodeshouldusetodeterminethehostnameusedbytheNameNodeforcommunicationanddisplaypurposes.
16
dfs.replication.considerLoad
true
DecideifchooseTargetconsidersthetarget'sloadornot
17
dfs.default.chunk.view.size
32768
Thenumberofbytestoviewforafileonthebrowser.
18
dfs.datanode.du.reserved
Reservedspaceinbytespervolume.Alwaysleavethismuchspacefreefornondfsuse.
19
dfs.name.dir
${hadoop.tmp.dir}/dfs/name
DetermineswhereonthelocalfilesystemtheDFSnamenodeshouldstorethenametable(fsimage).Ifthisisacomma-delimitedlistofdirectoriesthenthenametableisreplicatedinallofthedirectories,forredundancy.
20
dfs.name.edits.dir
${dfs.name.dir}
DetermineswhereonthelocalfilesystemtheDFSnamenodeshouldstorethetransaction(edits)file.Ifthisisacomma-delimitedlistofdirectoriesthenthetransactionfileisreplicatedinallofthedirectories,forredundancy.Defaultvalueissameasdfs.name.dir
21
dfs.web.ugi
webuser,webgroup
Theuseraccountusedbythewebinterface.Syntax:USERNAME,GROUP1,GROUP2,...
22
dfs.permissions
true
If"true",enablepermissioncheckinginHDFS.If"false",permissioncheckingisturnedoff,butallotherbehaviorisunchanged.Switchingfromoneparametervaluetotheotherdoesnotchangethemode,ownerorgroupoffilesordirectories.
23
dfs.permissions.supergroup
supergroup
Thenameofthegroupofsuper-users.
24
dfs.block.access.token.enable
false
If"true",accesstokensareusedascapabilitiesforaccessingdatanodes.If"false",noaccesstokensarecheckedonaccessingdatanodes.
25
dfs.block.access.key.update.interval
600
Intervalinminutesatwhichnamenodeupdatesitsaccesskeys.
26
dfs.block.access.token.lifetime
600
Thelifetimeofaccesstokensinminutes.
27
dfs.data.dir
${hadoop.tmp.dir}/dfs/data
DetermineswhereonthelocalfilesystemanDFSdatanodeshouldstoreitsblocks.Ifthisisacomma-delimitedlistofdirectories,thendatawillbestoredinallnameddirectories,typicallyondifferentdevices.Directoriesthatdonotexistareignored.
28
dfs.datanode.data.dir.perm
755
PermissionsforthedirectoriesononthelocalfilesystemwheretheDFSdatanodestoreitsblocks.Thepermissionscaneitherbeoctalorsymbolic.
29
dfs.replication
3
Defaultblockreplication.Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.Thedefaultisusedifreplicationisnotspecifiedincreatetime.
30
dfs.replication.max
512
Maximalblockreplication.
31
dfs.replication.min
1
Minimalblockreplication.
32
dfs.block.size
67108864
Thedefaultblocksizefornewfiles.
33
dfs.df.interval
60000
Diskusagestatisticsrefreshintervalinmsec.
34
dfs.client.block.write.retries
3
Thenumberofretriesforwritingblockstothedatanodes,beforewesignalfailuretotheapplication.
35
dfs.blockreport.intervalMsec
3600000
Determinesblockreportingintervalinmilliseconds.
36
dfs.blockreport.initialDelay
Delayforfirstblockreportinseconds.
37
dfs.heartbeat.interval
3
Determinesdatanodeheartbeatintervalinseconds.
38
dfs.namenode.handler.count
10
Thenumberofserverthreadsforthenamenode.
39
dfs.safemode.threshold.pct
0.999f
Specifiesthepercentageofblocksthatshouldsatisfytheminimalreplicationrequirementdefinedbydfs.replication.min.Valueslessthanorequalto0meannottostartinsafemode.Valuesgreaterthan1willmakesafemodepermanent.
Determinesextensionofsafemodeinmillisecondsafterthethresholdlevelisreached.
40
dfs.safemode.extension
30000
Determinesextensionofsafemodeinmillisecondsafterthethresholdlevelisreached.
41
dfs.balance.bandwidthPerSec
1048576
Specifiesthemaximumamountofbandwidththateachdatanodecanutilizeforthebalancingpurposeintermofthenumberofbytespersecond.
42
dfs.hosts
Namesafilethatcontainsalistofhoststhatarepermittedtoconnecttothenamenode.Thefullpathnameofthefilemustbespecified.Ifthevalueisempty,allhostsarepermitted.
43
dfs.hosts.exclude
Namesafilethatcontainsalistofhoststhatarenotpermittedtoconnecttothenamenode.Thefullpathnameofthefilemustbespecified.Ifthevalueisempty,nohostsareexcluded.
44
dfs.max.objects
0Themaximumnumberoffiles,directoriesandblocksdfssupports.Avalueofzeroindicatesnolimittothenumberofobjectsthatdfssupports.
45
dfs.namenode.decommission.interval
30
Namenodeperiodicityinsecondstocheckifdecommissioniscomplete.
46
dfs.namenode.decommission.nodes.per.interval
5
Thenumberofnodesnamenodechecksifdecommissioniscompleteineachdfs.namenode.decommission.interval.
47
dfs.replication.interval
3
Theperiodicityinsecondswithwhichthenamenodecomputesrepliactionworkfordatanodes.
48
dfs.access.time.precision
3600000
TheaccesstimeforHDFSfileispreciseuptothisvalue.Thedefaultvalueis1hour.Settingavalueof0disablesaccesstimesforHDFS.
49
dfs.support.append
false
DoesHDFSallowappendstofiles?Thisiscurrentlysettofalsebecausetherearebugsinthe"appendcode"andisnotsupportedinanyprodctioncluster.
50
dfs.namenode.delegation.key.update-interval
86400000
Theupdateintervalformasterkeyfordelegationtokensinthenamenodeinmilliseconds.
51
dfs.namenode.delegation.token.max-lifetime
604800000
Themaximumlifetimeinmillisecondsforwhichadelegationtokenisvalid.
52
dfs.namenode.delegation.token.renew-interval
86400000
Therenewalintervalfordelegationtokeninmilliseconds.
53
dfs.datanode.failed.volumes.tolerated
Thenumberofvolumesthatareallowedtofailbeforeadatanodestopsofferingservice.Bydefaultanyvolumefailurewillcauseadatanodetoshutdown.
3.3mapred-default.html
序号
参数名
参数值
参数说明
1
hadoop.job.history.location
Ifjobtrackerisstaticthehistoryfilesarestoredinthissinglewellknownplace.IfNovalueissethere,bydefault,itisinthelocalfilesystemat${hadoop.log.dir}/history.
2
hadoop.job.history.user.location
Usercanspecifyalocationtostorethehistoryfilesofaparticularjob.Ifnothingisspecified,thelogsarestoredinoutputdirectory.Thefilesarestoredin"_logs/history/"inthedirectory.Usercanstoploggingbygivingthevalue"none".
3
mapred.job.tracker.history.completed.location
Thecompletedjobhistoryfilesarestoredatthissinglewellknownlocation.Ifnothingisspecified,thefilesarestoredat${hadoop.job.history.location}/done.
4
io.sort.factor
10
Thenumberofstreamstomergeatoncewhilesortingfiles.Thisdeterminesthenumberofopenfilehandles.
5
io.sort.mb
100
Thetotalamountofbuffermemorytousewhilesortingfiles,inmegabytes.Bydefault,giveseachmergestream1MB,whichshouldminimizeseeks.
6
io.sort.record.percent
0.05
Thepercentageofio.sort.mbdedicatedtotrackingrecordboundaries.Letthisvalueber,io.sort.mbbex.Themaximumnumberofrecordscollectedbeforethecollectionthreadmustblockisequalto(r*x)/4
7
io.sort.spill.percent
0.80
Thesoftlimitineitherthebufferorrecordcollectionbuffers.Oncereached,athreadwillbegintospillthecontentstodiskinthebackground.Notethatthisdoesnotimplyanychunkingofdatatothespill.Avaluelessthan0.5isnotrecommended.
8
io.map.index.skip
Numberofindexentriestoskipbetweeneachentry.Zerobydefault.Settingthistovalueslargerthanzerocanfacilitateopeninglargemapfilesusinglessmemory.
9
mapred.job.tracker
local
ThehostandportthattheMapReducejobtrackerrunsat.If"local",thenjobsarerunin-processasasinglemapandreducetask.
10
mapred.job.tracker.http.address
0.0.0.0:50030
Thejobtrackerhttpserveraddressandporttheserverwilllistenon.Iftheportis0thentheserverwillstartonafreeport.
11
mapred.job.tracker.handler.count
10
ThenumberofserverthreadsfortheJobTracker.Thisshouldberoughly4%ofthenumberoftasktrackernodes.
12
mapred.task.tracker.report.address
127.0.0.1:0
Theinterfaceandportthattasktrackerserverlistenson.Sinceitisonlyconnectedtobythetasks,itusesthelocalinterface.EXPERTONLY.Shouldonlybechangedifyourhostdoesnothavetheloopbackinterface.
13
mapred.local.dir
${hadoop.tmp.dir}/mapred/local
ThelocaldirectorywhereMapReducestoresintermediatedatafiles.Maybeacomma-separatedlistofdirectoriesondifferentdevicesinordertospreaddiski/o.Directoriesthatdonotexistareignored.
14
mapred.system.dir
${hadoop.tmp.dir}/mapred/system
ThedirectorywhereMapReducestorescontrolfiles.
15
mapreduce.jobtracker.staging.root.dir
${hadoop.tmp.dir}/mapred/staging
Therootofthestagingareaforusers'jobfilesInpractice,thisshouldbethedirectorywhereusers'homedirectoriesarelocated(usually/user)
16
mapred.temp.dir
${hadoop.tmp.dir}/mapred/temp
Ashareddirectoryfortemporaryfiles.
17
mapred.local.dir.minspacestart
Ifthespaceinmapred.local.dirdropsunderthis,donotaskformoretasks.Valueinbytes.
18
mapred.local.dir.minspacekill
Ifthespaceinmapred.local.dirdropsunderthis,donotaskmoretasksuntilallthecurrentoneshavefinishedandcleanedup.Also,tosavetherestofthetaskswehaverunning,killoneofthem,tocleanupsomespace.Startwiththereducetasks,thengowiththeonesthathavefinishedtheleast.Valueinbytes.
19
mapred.tasktracker.expiry.interval
600000
Expert:Thetime-interval,inmiliseconds,afterwhichatasktrackerisdeclared'lost'ifitdoesn'tsendheartbeats.
20
mapred.tasktracker.resourcecalculatorplugin
Nameoftheclasswhoseinstancewillbeusedtoqueryresourceinformationonthetasktracker.Theclassmustbeaninstanceoforg.apache.hadoop.util.ResourceCalculatorPlugin.Ifthevalueisnull,thetasktrackerattemptstouseaclassappropriatetotheplatform.Currently,theonlyplatformsupportedisLinux.
21
mapred.tasktracker.taskmemorymanager.monitoring-interval
5000
Theinterval,inmilliseconds,forwhichthetasktrackerwaitsbetweentwocyclesofmonitoringitstasks'memoryusage.Usedonlyiftasks'memorymanagementisenabledviamapred.tasktracker.tasks.maxmemory.
22
mapred.tasktracker.tasks.sleeptime-before-sigkill
5000
Thetime,inmilliseconds,thetasktrackerwaitsforsendingaSIGKILLtoaprocess,afterithasbeensentaSIGTERM.
23
mapred.map.tasks
2
Thedefaultnumberofmaptasksperjob.Ignoredwhenmapred.job.trackeris"local".
24
mapred.reduce.tasks
1
Thedefaultnumberofreducetasksperjob.Typicallysetto99%ofthecluster'sreducecapacity,sothatifanodefailsthereducescanstillbeexecutedinasinglewave.Ignoredwhenmapred.job.trackeris"local".
25
mapreduce.tasktracker.outofband.heartbeat
false
Expert:Setthistotruetoletthetasktrackersendanout-of-bandheartbeatontask-completionforbetterlatency.
26
mapreduce.tasktracker.outofband.heartbeat.damper
1000000
Whenout-of-bandheartbeatsareenabled,providesdampingtoavoidoverwhelmingtheJobTrackeriftoomanyout-of-bandheartbeatswouldoccur.Thedampingiscalculatedsuchthattheheartbeatintervalisdividedby(T*D+1)whereTisthenumberofcompletedtasksandDisthedampervalue.Settingthistoahighvaluelikethedefaultprovidesnodamping--assoonasanytaskfinishes,aheartbeatwillbesent.Settingthisparameterto0isequivalenttodisablingtheout-of-bandheartbeatfeature.Avalueof1wouldindicatethat,afteronetaskhascompleted,thetimetowaitbeforethenextheartbeatwouldbe1/2theusualtime.Aftertwotaskshavefinished,itwouldbe1/3theusualtime,etc.
27
mapred.jobtracker.restart.recover
false
"true"toenable(job)recoveryuponrestart,"false"tostartafresh
28
mapred.jobtracker.job.history.block.size
3145728
Theblocksizeofthejobhistoryfile.Sincethejobrecoveryusesjobhistory,itsimportanttodumpjobhistorytodiskassoonaspossible.Notethatthisisanexpertlevelparameter.Thedefaultvalueissetto3MB.
29
mapreduce.job.split.metainfo.maxsize
10000000
Themaximumpermissiblesizeofthesplitmetainfofile.TheJobTrackerwon'tattempttoreadsplitmetainfofilesbiggerthantheconfiguredvalue.Nolimitsifsetto-1.
30
mapred.jobtracker.taskScheduler
org.apache.hadoop.mapred.JobQueueTaskScheduler
Theclassresponsibleforschedulingthetasks.
31
mapred.jobtracker.taskScheduler.maxRunningTasksPerJob
Themaximumnumberofrunningtasksforajobbeforeitgetspreempted.Nolimitsifundefined.
32
mapred.map.max.attempts
4
Expert:Themaximumnumberofattemptspermaptask.Inotherwords,frameworkwilltrytoexecuteamaptaskthesemanynumberoftimesbeforegivinguponit.
33
mapred.reduce.max.attempts
4
Expert:Themaximumnumberofattemptsperreducetask.Inotherwords,frameworkwilltrytoexecuteareducetaskthesemanynumberoftimesbeforegivinguponit.
34
mapred.reduce.parallel.copies
5
Thedefaultnumberofparalleltransfersrunbyreduceduringthecopy(shuffle)phase.
35
mapreduce.reduce.shuffle.maxfetchfailures
10
Themaximumnumberoftimesareducertriestofetchamapoutputbeforeitreportsit.
36
mapreduce.reduce.shuffle.connect.timeout
180000
Expert:Themaximumamountoftime(inmilliseconds)areducetaskspendsintryingtoconnecttoatasktrackerforgettingmapoutput.
37
mapreduce.reduce.shuffle.read.timeout
180000
Expert:Themaximumamountoftime(inmilliseconds)areducetaskwaitsformapoutputdatatobeavailableforreadingafterobtainingconnection.
38
mapred.task.timeout
600000
Thenumberofmillisecondsbeforeataskwillbeterminatedifitneitherreadsaninput,writesanoutput,norupdatesitsstatusstring.
39
mapred.tasktracker.map.tasks.maximum
2
Themaximumnumberofmaptasksthatwillberunsimultaneouslybyatasktracker.
40
mapred.tasktracker.reduce.tasks.maximum
2
Themaximumnumberofreducetasksthatwillberunsimultaneouslybyatasktracker.
41
mapred.jobtracker.completeuserjobs.maximum
100
Themaximumnumberofcompletejobsperusertokeeparoundbeforedelegatingthemtothejobhistory.
42
mapreduce.reduce.input.limit
-1
Thelimitontheinputsizeofthereduce.Iftheestimatedinputsizeofthereduceisgreaterthanthisvalue,jobisfailed.Avalueof-1meansthatthereisnolimitset.
43
mapred.job.tracker.retiredjobs.cache.size
1000
Thenumberofretiredjobstatustokeepinthecache.
44
mapred.job.tracker.jobhistory.lru.cache.size
5
Thenumberofjobhistoryfilesloadedinmemory.Thejobsareloadedwhentheyarefirstaccessed.ThecacheisclearedbasedonLRU.
45
mapred.child.java.opts
-Xmx200m
Javaoptsforthetasktrackerchildprocesses.Thefollowingsymbol,ifpresent,willbeinterpolated:@[email protected]'@'willgounchanged.Forexample,toenableverbosegcloggingtoafilenamedforthetaskidin/tmpandtosettheheapmaximumtobeagigabyte,passa'value'of:-Xmx1024m-verbose:gc-Xloggc:/tmp/@taskid@.gcTheconfigurationvariablemapred.child.ulimitcanbeusedtocontrolthemaximumvirtualmemoryofthechildprocesses.
46
mapred.child.env
Useraddedenvironmentvariablesforthetasktrackerchildprocesses.Example:1)A=fooThiswillsettheenvvariableAtofoo2)B=$B:cThisisinherittasktracker'sBenvvariable.
47
mapred.child.ulimit
Themaximumvirtualmemory,inKB,ofaprocesslaunchedbytheMap-Reduceframework.ThiscanbeusedtocontrolboththeMapper/ReducertasksandapplicationsusingHadoopPipes,HadoopStreamingetc.Bydefaultitisleftunspecifiedtoletclusteradminscontrolitvialimits.confandothersuchrelevantmechanisms.Note:mapred.child.ulimitmustbegreaterthanorequaltothe-XmxpassedtoJavaVM,elsetheVMmightnotstart.
48
mapred.cluster.map.memory.mb
-1
Thesize,intermsofvirtualmemory,ofasinglemapslotintheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglemaptaskviamapred.job.map.memory.mb,uptothelimitspecifiedbymapred.cluster.max.map.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.
49
mapred.cluster.reduce.memory.mb
-1
Thesize,intermsofvirtualmemory,ofasinglereduceslotintheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglereducetaskviamapred.job.reduce.memory.mb,uptothelimitspecifiedbymapred.cluster.max.reduce.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.
50
mapred.cluster.max.map.memory.mb
-1
Themaximumsize,intermsofvirtualmemory,ofasinglemaptasklaunchedbytheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglemaptaskviamapred.job.map.memory.mb,uptothelimitspecifiedbymapred.cluster.max.map.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.
51
mapred.cluster.max.reduce.memory.mb
-1
Themaximumsize,intermsofvirtualmemory,ofasinglereducetasklaunchedbytheMap-Reduceframework,usedbythescheduler.Ajobcanaskformultipleslotsforasinglereducetaskviamapred.job.reduce.memory.mb,uptothelimitspecifiedbymapred.cluster.max.reduce.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoff.
52
mapred.job.map.memory.mb
-1
Thesize,intermsofvirtualmemory,ofasinglemaptaskforthejob.Ajobcanaskformultipleslotsforasinglemaptask,roundeduptothenextmultipleofmapred.cluster.map.memory.mbanduptothelimitspecifiedbymapred.cluster.max.map.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoffiffmapred.cluster.map.memory.mbisalsoturnedoff(-1).
53
mapred.job.reduce.memory.mb
-1
Thesize,intermsofvirtualmemory,ofasinglereducetaskforthejob.Ajobcanaskformultipleslotsforasinglemaptask,roundeduptothenextmultipleofmapred.cluster.reduce.memory.mbanduptothelimitspecifiedbymapred.cluster.max.reduce.memory.mb,iftheschedulersupportsthefeature.Thevalueof-1indicatesthatthisfeatureisturnedoffiffmapred.cluster.reduce.memory.mbisalsoturnedoff(-1).
54
mapred.child.tmp
/tmp
Tosetthevalueoftmpdirectoryformapandreducetasks.Ifthevalueisanabsolutepath,itisdirectlyassigned.Otherwise,itisprependedwithtask'sworkingdirectory.Thejavatasksareexecutedwithoption-Djava.io.tmpdir='theabsolutepathofthetmpdir'.Pipesandstreamingaresetwithenvironmentvariable,TMPDIR='theabsolutepathofthetmpdir'
55
mapred.inmem.merge.threshold
1000
Thethreshold,intermsofthenumberoffilesforthein-memorymergeprocess.Whenweaccumulatethresholdnumberoffilesweinitiatethein-memorymergeandspilltodisk.Avalueof0orlessthan0indicateswewanttoDON'Thaveanythresholdandinsteaddependonlyontheramfs'smemoryconsumptiontotriggerthemerge.
56
mapred.job.shuffle.merge.percent
0.66
Theusagethresholdatwhichanin-memorymergewillbeinitiated,expressedasapercentageofthetotalmemoryallocatedtostoringin-memorymapoutputs,asdefinedbymapred.job.shuffle.input.buffer.percent.
57
mapred.job.shuffle.input.buffer.percent
0.70
Thepercentageofmemorytobeallocatedfromthemaximumheapsizetostoringmapoutputsduringtheshuffle.
58
mapred.job.reduce.input.buffer.percent
0.0
Thepercentageofmemory-relativetothemaximumheapsize-toretainmapoutputsduringthereduce.Whentheshuffleisconcluded,anyremainingmapoutputsinmemorymustconsumelessthanthisthresholdbeforethereducecanbegin.
59
mapred.map.tasks.speculative.execution
true
Iftrue,thenmultipleinstancesofsomemaptasksmaybeexecutedinparallel.
60
mapred.reduce.tasks.speculative.execution
true
Iftrue,thenmultipleinstancesofsomereducetasksmaybeexecutedinparallel.
61
mapred.job.reuse.jvm.num.tasks
1
Howmanytaskstorunperjvm.Ifsetto-1,thereisnolimit.
62
mapred.min.split.size
Theminimumsizechunkthatmapinputshouldbesplitinto.Notethatsomefileformatsmayhaveminimumsplitsizesthattakepriorityoverthissetting.
63
mapred.jobtracker.maxtasks.per.job
-1
Themaximumnumberoftasksforasinglejob.Avalueof-1indicatesthatthereisnomaximum.
64
mapred.submit.replication
10
Thereplicationlevelforsubmittedjobfiles.Thisshouldbearoundthesquarerootofthenumberofnodes.
65
mapred.tasktracker.dns.interface
default
ThenameoftheNetworkInterfacefromwhichatasktrackershouldreportitsIPaddress.
66
mapred.tasktracker.dns.nameserver
default
ThehostnameorIPaddressofthenameserver(DNS)whichaTaskTrackershouldusetodeterminethehostnameusedbytheJobTrackerforcommunicationanddisplaypurposes.
67
tasktracker.http.threads
40
Thenumberofworkerthreadsthatforthehttpserver.Thisisusedformapoutputfetching
68
mapred.task.tracker.http.address
0.0.0.0:50060
Thetasktrackerhttpserveraddressandport.Iftheportis0thentheserverwillstartonafreeport.
69
keep.failed.task.files
false
Shouldthefilesforfailedtasksbekept.Thisshouldonlybeusedonjobsthatarefailing,becausethestorageisneverreclaimed.Italsopreventsthemapoutputsfrombeingerasedfromthereducedirectoryastheyareconsumed.
70
mapred.output.compress
false
Shouldthejoboutputsbecompressed?
71
mapred.output.compression.type
RECORD
IfthejoboutputsaretocompressedasSequenceFiles,howshouldtheybecompressed?ShouldbeoneofNONE,RECORDorBLOCK.
72
mapred.output.compression.codec
org.apache.hadoop.io.compress.DefaultCodec
Ifthejoboutputsarecompressed,howshouldtheybecompressed?
73
mapred.compress.map.output
false
Shouldtheoutputsofthemapsbecompressedbeforebeingsentacrossthenetwork.UsesSequenceFilecompression.
74
mapred.map.output.compression.codec
org.apache.hadoop.io.compress.DefaultCodec
Ifthemapoutputsarecompressed,howshouldtheybecompressed?
75
map.sort.class
org.apache.hadoop.util.QuickSort
Thedefaultsortclassforsortingkeys.
76
mapred.userlog.limit.kb
Themaximumsizeofuser-logsofeachtaskinKB.0disablesthecap.
77
mapred.userlog.retain.hours
24
Themaximumtime,inhours,forwhichtheuser-logsaretoberetainedafterthejobcompletion.
78
mapred.user.jobconf.limit
5242880
Themaximumallowedsizeoftheuserjobconf.Thedefaultissetto5MB
79
mapred.hosts
Namesafilethatcontainsthelistofnodesthatmayconnecttothejobtracker.Ifthevalueisempty,allhostsarepermitted.
80
mapred.hosts.exclude
Namesafilethatcontainsthelistofhoststhatshouldbeexcludedbythejobtracker.Ifthevalueisempty,nohostsareexcluded.
81
mapred.heartbeats.in.second
100
Expert:Approximatenumberofheart-beatsthatcouldarriveatJobTrackerinasecond.AssumingeachRPCcanbeprocessedin10msec,thedefaultvalueismade100RPCsinasecond.
82
mapred.max.tracker.blacklists
4
Thenumberofblacklistsforatasktrackerbyvariousjobsafterwhichthetasktrackerwillbemarkedaspotentiallyfaultyandisacandidateforgraylistingacrossalljobs.(Unlikeblacklisting,thisisadvisory;thetrackerremainsactive.However,itisreportedasgraylistedinthewebUI,withtheexpectationthatchronicallygraylistedtrackerswillbemanuallydecommissioned.)Thisvalueistiedtomapred.jobtracker.blacklist.fault-timeout-window;faultsolderthanthewindowwidthareforgiven,sothetrackerwillrecoverfromtransientproblems.Itwillalsobecomehealthyafterarestart.
83
mapred.jobtracker.blacklist.fault-timeout-window
180
Thetimeout(inminutes)afterwhichper-jobtasktrackerfaultsareforgiven.Thewindowislogicallyacircularbufferoftime-intervalbucketswhosewidthisdefinedbymapred.jobtracker.blacklist.fault-bucket-width;whenthe"now"pointermovesacrossabucketboundary,thepreviouscontents(faults)ofthenewbucketarecleared.Inotherwords,thetimeout'sgranularityisdeterminedbythebucketwidth.
84
mapred.jobtracker.blacklist.fault-bucket-width
15
Thewidth(inminutes)ofeachbucketinthetasktrackerfaulttimeoutwindow.Eachbucketisreusedinacircularmannerafterafulltimeout-windowinterval(definedbymapred.jobtracker.blacklist.fault-timeout-window).
85
mapred.max.tracker.failures
4
Thenumberoftask-failuresonatasktrackerofagivenjobafterwhichnewtasksofthatjobaren'tassignedtoit.
86
jobclient.output.filter
FAILED
Thefilterforcontrollingtheoutputofthetask'suserlogssenttotheconsoleoftheJobClient.Thepermissibleoptionsare:NONE,KILLED,FAILED,SUCCEEDEDandALL.
87
mapred.job.tracker.persist.jobstatus.active
false
Indicatesifpersistencyofjobstatusinformationisactiveornot.
88
mapred.job.tracker.persist.jobstatus.hours
ThenumberofhoursjobstatusinformationispersistedinDFS.Thejobstatusinformationwillbeavailableafteritdropsofthememoryqueueandbetweenjobtrackerrestarts.WithazerovaluethejobstatusinformationisnotpersistedatallinDFS.
89
mapred.job.tracker.persist.jobstatus.dir
/jobtracker/jobsInfo
Thedirectorywherethejobstatusinformationispersistedinafilesystemtobeavailableafteritdropsofthememoryqueueandbetweenjobtrackerrestarts.
90
mapreduce.job.complete.cancel.delegation.tokens
true
iffalse-donotunregister/canceldelegationtokensfromrenewal,becausesametokensmaybeusedbyspawnedjobs
91
mapred.task.profile
false
Tosetwhetherthesystemshouldcollectprofilerinformationforsomeofthetasksinthisjob?Theinformationisstoredintheuserlogdirectory.Thevalueis"true"iftaskprofilingisenabled.
92
mapred.task.profile.maps
0-2
Tosettherangesofmaptaskstoprofile.mapred.task.profilehastobesettotrueforthevaluetobeaccounted.
93
mapred.task.profile.reduces
0-2
Tosettherangesofreducetaskstoprofile.mapred.task.profilehastobesettotrueforthevaluetobeaccounted.
94
mapred.line.input.format.linespermap
1
NumberoflinespersplitinNLineInputFormat.
95
mapred.skip.attempts.to.start.skipping
2
ThenumberofTaskattemptsAFTERwhichskipmodewillbekickedoff.Whenskipmodeiskickedoff,thetasksreportstherangeofrecordswhichitwillprocessnext,totheTaskTracker.Sothatonfailures,TTknowswhichonesarepossiblythebadrecords.Onfurtherexecutions,thoseareskipped.
96
mapred.skip.map.auto.incr.proc.count
true
Theflagwhichifsettotrue,SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDSisincrementedbyMapRunnerafterinvokingthemapfunction.Thisvaluemustbesettofalseforapplicationswhichprocesstherecordsasynchronouslyorbuffertheinputrecords.Forexamplestreaming.Insuchcasesapplicationsshouldincrementthiscounterontheirown.
97
mapred.skip.reduce.auto.incr.proc.count
true
Theflagwhichifsettotrue,SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPSisincrementedbyframeworkafterinvokingthereducefunction.Thisvaluemustbesettofalseforapplicationswhichprocesstherecordsasynchronouslyorbuffertheinputrecords.Forexamplestreaming.Insuchcasesapplicationsshouldincrementthiscounterontheirown.
98
mapred.skip.out.dir
Ifnovalueisspecifiedhere,theskippedrecordsarewrittentotheoutputdirectoryat_logs/skip.Usercanstopwritingskippedrecordsbygivingthevalue"none".
99
mapred.skip.map.max.skip.records
ThenumberofacceptableskiprecordssurroundingthebadrecordPERbadrecordinmapper.Thenumberincludesthebadrecordaswell.Toturnthefeatureofdetection/skippingofbadrecordsoff,setthevalueto0.TheframeworktriestonarrowdowntheskippedrangebyretryinguntilthisthresholdismetORallattemptsgetexhaustedforthistask.SetthevaluetoLong.MAX_VALUEtoindicatethatframeworkneednottrytonarrowdown.Whateverrecords(dependsonapplication)getskippedareacceptable.
100
mapred.skip.reduce.max.skip.groups
ThenumberofacceptableskipgroupssurroundingthebadgroupPERbadgroupinreducer.Thenumberincludesthebadgroupaswell.Toturnthefeatureofdetection/skippingofbadgroupsoff,setthevalueto0.TheframeworktriestonarrowdowntheskippedrangebyretryinguntilthisthresholdismetORallattemptsgetexhaustedforthistask.SetthevaluetoLong.MAX_VALUEtoindicatethatframeworkneednottrytonarrowdown.Whatevergroups(dependsonapplication)getskippedareacceptable.
101
job.end.retry.attempts
IndicateshowmanytimeshadoopshouldattempttocontactthenotificationURL
102
job.end.retry.interval
30000
IndicatestimeinmillisecondsbetweennotificationURLretrycalls
103
hadoop.rpc.socket.factory.class.JobSubmissionProtocol
SocketFactorytousetoconnecttoaMap/Reducemaster(JobTracker).Ifnullorempty,thenusehadoop.rpc.socket.class.default.
104
mapred.task.cache.levels
2
Thisisthemaxlevelofthetaskcache.Forexample,ifthelevelis2,thetaskscachedareatthehostlevelandattheracklevel
105
mapred.queue.names
default
Commaseparatedlistofqueuesconfiguredforthisjobtracker.Jobsareaddedtoqueuesandschedulerscanconfiguredifferentschedulingpropertiesforthevariousqueues.Toconfigureapropertyforaqueue,thenameofthequeuemustmatchthenamespecifiedinthisvalue.Queuepropertiesthatarecommontoallschedulersareconfiguredherewiththenamingconvention,mapred.queue.$QUEUE-NAME.$PROPERTY-NAME,fore.g.mapred.queue.default.submit-job-acl.Thenumberofqueuesconfiguredinthisparametercoulddependonthetypeofschedulerbeingused,asspecifiedinmapred.jobtracker.taskScheduler.Forexample,theJobQueueTaskSchedulersupportsonlyasinglequeue,whichisthedefaultconfiguredhere.Beforeaddingmorequeues,ensurethatthescheduleryou'veconfiguredsupportsmultiplequeues.
106
mapred.acls.enabled
false
SpecifieswhetherACLsshouldbecheckedforauthorizationofusersfordoingvariousqueueandjobleveloperations.ACLsaredisabledbydefault.Ifenabled,accesscontrolchecksaremadebyJobTrackerandTaskTrackerwhenrequestsaremadebyusersforqueueoperationslikesubmitjobtoaqueueandkillajobinthequeueandjoboperationslikeviewingthejob-details(Seemapreduce.job.acl-view-job)orformodifyingthejob(Seemapreduce.job.acl-modify-job)usingMap/ReduceAPIs,RPCsorviatheconsoleandwebuserinterfaces.
107
mapred.queue.default.state
RUNNING
Thisvaluesdefinesthestate,defaultqueueisin.thevaluescanbeeither"STOPPED"or"RUNNING"Thisvaluecanbechangedatruntime.
108
mapred.job.queue.name
default
Queuetowhichajobissubmitted.Thismustmatchoneofthequeuesdefinedinmapred.queue.namesforthesystem.Also,theACLsetupforthequeuemustallowthecurrentusertosubmitajobtothequeue.Beforespecifyingaqueue,ensurethatthesystemisconfiguredwiththequeue,andaccessisallowedforsubmittingjobstothequeue.
109
mapreduce.job.acl-modify-job
Jobspecificaccess-controllistfor'modifying'thejob.ItisonlyusedifauthorizationisenabledinMap/Reducebysettingtheconfigurationpropertymapred.acls.enabledtotrue.Thisspecifiesthelistofusersand/orgroupswhocandomodificationoperationsonthejob.Forspecifyingalistofusersandgroupstheformattouseis"user1,user2group1,group".Ifsetto'*',itallowsallusers/groupstomodifythisjob.Ifsetto''(i.e.space),itallowsnone.Thisconfigurationisusedtoguardallthemodificationswithrespecttothisjobandtakescareofallthefollowingoperations:okillingthisjobokillingataskofthisjob,failingataskofthisjobosettingthepriorityofthisjobEachoftheseoperationsarealsoprotectedbytheper-queuelevelACL"acl-administer-jobs"configuredviamapred-queues.xml.Soacallershouldhavetheauthorizationtosatisfyeitherthequeue-levelACLorthejob-levelACL.IrrespectiveofthisACLconfiguration,job-owner,theuserwhostartedthecluster,clusteradministratorsconfiguredviamapreduce.cluster.administratorsandqueueadministratorsofthequeuetowhichthisjobissubmittedtoconfiguredviamapred.queue.queue-name.acl-administer-jobsinmapred-queue-acls.xmlcandoallthemodificationoperationsonajob.Bydefault,nobodyelsebesidesjob-owner,theuserwhostartedthecluster,clusteradministratorsandqueueadministratorscanperformmodificationoperationsonajob.
110
mapreduce.job.acl-view-job
Jobspecificaccess-controllistfor'viewing'thejob.ItisonlyusedifauthorizationisenabledinMap/Reducebysettingtheconfigurationpropertymapred.acls.enabledtotrue.Thisspecifiesthelistofusersand/orgroupswhocanviewprivatedetailsaboutthejob.Forspecifyingalistofusersandgroupstheformattouseis"user1,user2group1,group".Ifsetto'*',itallowsallusers/groupstomodifythisjob.Ifsetto''(i.e.space),itallowsnone.Thisconfigurationisusedtoguardsomeofthejob-viewsandatpresentonlyprotectsAPIsthatcanreturnpossiblysensitiveinformationofthejob-ownerlikeojob-levelcountersotask-levelcountersotasks'diagnosticinformationotask-logsdisplayedontheTaskTrackerweb-UIandojob.xmlshowedbytheJobTracker'sweb-UIEveryotherpieceofinformationofjobsisstillaccessiblebyanyotheruser,fore.g.,JobStatus,JobProfile,listofjobsinthequeue,etc.IrrespectiveofthisACLconfiguration,job-owner,theuserwhostartedthecluster,clusteradministratorsconfiguredviamapreduce.cluster.administratorsandqueueadministratorsofthequeuetowhichthisjobissubmittedtoconfiguredviamapred.queue.queue-name.acl-administer-jobsinmapred-queue-acls.xmlcandoalltheviewoperationsonajob.Bydefault,nobodyelsebesidesjob-owner,theuserwhostartedthecluster,clusteradministratorsandqueueadministratorscanperformviewoperationsonajob.
111
mapred.tasktracker.indexcache.mb
10
Themaximummemorythatatasktrackerallowsfortheindexcachethatisusedwhenservingmapoutputstoreducers.
112
mapred.combine.recordsBeforeProgress
10000
ThenumberofrecordstoprocessduringcombineoutputcollectionbeforesendingaprogressnotificationtotheTaskTracker.
113
mapred.merge.recordsBeforeProgress
10000
ThenumberofrecordstoprocessduringmergebeforesendingaprogressnotificationtotheTaskTracker.
114
mapred.reduce.slowstart.completed.maps
0.05
Fractionofthenumberofmapsinthejobwhichshouldbecompletebeforereducesarescheduledforthejob.
115
mapred.task.tracker.task-controller
org.apache.hadoop.mapred.DefaultTaskController
TaskControllerwhichisusedtolaunchandmanagetaskexecution
116
mapreduce.tasktracker.group
Expert:GrouptowhichTaskTrackerbelongs.IfLinuxTaskControllerisconfiguredviamapreduce.tasktracker.taskcontroller,thegroupownerofthetask-controllerbinaryshouldbesameasthisgroup.
117
mapred.healthChecker.script.path
Absolutepathtothescriptwhichisperiodicallyrunbythenodehealthmonitoringservicetodetermineifthenodeishealthyornot.Ifthevalueofthiskeyisemptyorthefiledoesnotexistinthelocationconfiguredhere,thenodehealthmonitoringserviceisnotstarted.
118
mapred.healthChecker.interval
60000
Frequencyofthenodehealthscripttoberun,inmilliseconds
119
mapred.healthChecker.script.timeout
600000
Timeafternodehealthscriptshouldbekilledifunresponsiveandconsideredthatthescripthasfailed.
120
mapred.healthChecker.script.args
Listofargumentswhicharetobepassedtonodehealthscriptwhenitisbeinglaunchedcommaseperated.
121
mapreduce.job.counters.limit
120
Limitonthenumberofcountersallowedperjob.