Tag Archives: hadoop

Hadoop – Downgrading from YARN to MRv1 (Cloudera CDH4)

With two versions of MapReduce available for Hadoop, the older MRv1 and the newer YARN, sometimes you need to move between the two.  Using RPM’s or other packages with the Cloudera CDH installation makes this mostly easy, however there is … Continue reading

Posted in Data Analytics | Tagged , , , | Leave a comment

Changing key/value split delimiter in Hadoop .20.2

You can find a list of deprecated properties in Hadoop .20.2 here: http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-project-dist/hadoop-common/DeprecatedProperties.html After reading this you may think that you need to set mapreduce.input.keyvaluelinerecordreader.key.value.separator in order to change the delimiter for the KeyValueTextInputFormat.  However, what I have noticed from experience … Continue reading

Posted in Data Analytics | Tagged | Leave a comment

Changing MapReduce number of Mappers in Hadoop .20.2

In earlier releases of Hadoop you could change the number of mappers by setting: setNumMapTasks() You did this using JobConf.  Things in Hadoop .20.2 have migrated to using the Job class instead of JobConf.  Although setNumReduceTasks() is still valid, setNumMapTasks() … Continue reading

Posted in Data Analytics | Tagged , | 2 Comments

Apache Mahout prepare20newsgroups in version .7

Apache Mahout has gone through some changes recently, and one of the things you will notice no longer works, is the old prepare20newsgroups classifier routine.  This has been replaced, and the new syntax is much different.  This page will walk … Continue reading

Posted in Data Analytics | Tagged , | Leave a comment