MapReduce is a distributed computing paradigm which allows you divide up tasks into smaller jobs and "map" them to a
unique key.
It is then possible to distribute these jobs over a compute grid and "reduce" all of the job results back into the
original task
The MapReduce algorithm allows for the simultaneous execution of software code and the processing of certain
kinds of distributable
problems using a large number of compute nodes.
"Map" step: The master node takes the input, divides it up into smaller sub-problems and distributes those to worker nodes.
A worker node may repeat this step again in turn. Eventually the worker node processes the smaller sub-problem and passes
the result back to the master node.
"Reduce" step: The master node takes the results of all the sub-problems and combines them to get a final output, which
is the
original task that was executed.
|