YARN and MapReduce2 Memory
First let me explain what's the difference between YARN and MapReduce2. YARN is a generic platform for any form of distributed application to run on, while MapReduce2 is one such distributed application that runs the MapReduce framework on top of YARN.
YARN treats memory in a more fine-grained manner than the slot-based model used in the classic implementation of MapReduce. It allows application to request arbitrary amount of memory (with limits) for a task. In the YARN model, node managers allocate memory from a pool, so the number of tasks that are running on a particular node depends on the sum of their memory requirements, and not simply on a fixed number of slots. Also, YARN don't distinguish between map slot and reduce slot to avoid memory wastage.
In YARN, each Hadoop daemon uses 1000MB, so for a DataNode and a NodeManager, the total is 2000MB. Set aside enough for other processes that are running on the machine, the reminder can be dedicated to the NodeManager's containers by setting the yarn.nodemanager.resource.memory-mb to the total allocation in MB (default as 8192MB).
How to set memory options for individual jobs. There're two controls: mapred.child.java.opts, which allows you to set the JVM heap size of the map of reduce task; and mapreduce.map.memory.mb which is used to specify how much memory you need for map task containers(mapreduce.reduce.memory.mb for reduce task containers). The latter setting is used by the ApplicationMaster when negotiating for resources in the cluster, and also by the NodeManager, which runs and monitors the task containers.
Let's be more specific by a example. suppose that the mapred.child.java.opts is set to -Xmx800m and mapreduce.map.memory.mb is left as its default value 1024MB. When a map task run, the NodeManager will allocate a 1024MB container (decreasing the size of it's pool by that amount for the duration of task) and will launch the task JVM configured with a 800MB max heap size. Note that the JVM process will have a larger memory footprint that the heap size, and the overhead will depend on such thing as the native libs in use, the size of PermGen space, etc... The important thing is that the physical memory used by the JVM process, including any processes that it spawns, such as Streaming or Pipes processes, does not exceed it allocation(1024MB). If a container uses more memory than it has been allocated, then it may be terminated by the NodeManager and marked as failed.
Schedulers may impose a min or max on memory allocations. For example, for the Capacity Scheduler, the default min memory is 1024MB, which is set by yarn.scheduler.capacity.minimum-allocation-mb; and the default max memory is 10240MB, which is set by yarn.scheduler.capacity.maximum-allocation-MB.
There're also virtual memory constraints that a container must meet. If a container's virtual memory usage exceeds a given multiple of the allocated physical memory, the node manager may terminate the process. The multiple is expressed by the yarn.nodemanager.vmem-pmem-ratio property, which defaults to 2.1. In above example, the virtual memory threshold above which the task may be terminated is 2150MB, which is 2.1 x 1024MB.
When configuring memory parameters, it's very useful to monitor a task's actual memory usage during a job run, and this is possible via MapReduce task counters. The counters PHYSICAL_MEMORY_BYTES, VIRTUAL_MEMORY_BYTES, and COMMITTED_HEAP_BYTES provide snapshot values of memory usage and are therefore suitable for observation during the course of a task attampt.