Changing key/value split delimiter in Hadoop .20.2

You can find a list of deprecated properties in Hadoop .20.2 here:

http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html

After reading this you may think that you need to set mapreduce.input.keyvaluelinerecordreader.key.value.separator in order to change the delimiter for the KeyValueTextInputFormat.  However, what I have noticed from experience is that this is one of many areas that differ from what the documentation would lead you to believe.

What you must do is continue to use key.value.separator.in.input.line.  You will do this like so:

public int run(String[] args) throws Exception {

     Configuration conf = getConf();

     conf.set(“key.value.separator.in.input.line”, “,”);

At the present time the API for Hadoop can be quite confusing, as there are many areas where things have changed, from the simple spelling of methods, to entire syntaxes changing. The documentation doesn’t always lead you to success, so you must experiment.  

This entry was posted in Data Analytics and tagged . Bookmark the permalink.

Leave a Reply