Changing MapReduce number of Mappers in Hadoop .20.2

In earlier releases of Hadoop you could change the number of mappers by setting:

setNumMapTasks()

You did this using JobConf.  Things in Hadoop .20.2 have migrated to using the Job class instead of JobConf.  Although setNumReduceTasks() is still valid, setNumMapTasks() has been deprecated.  How then do you set the number of Mappers on a MapReduce job?  You must adjust the split size.  There is much written on this, but it can be difficult to find at times.  The split size is determined by the InputFormat being used.  I typically use KeyValueTextInputFormat.  To adjust my split size, I simply pass the mapred.max.split.size parameter like so:

-D mapred.max-split.size=2500

In this example my input file size was 24451.  By setting the parameter –D mapred.max.split.size=2500, I was able to configure 10 map tasks.

This entry was posted in Data Analytics and tagged , . Bookmark the permalink.

2 Responses to Changing MapReduce number of Mappers in Hadoop .20.2

Leave a Reply