About this blog
This blog is mostly about my pursuits in Data Science. Previous blog entries also dealt with storage, compute, virtualization and professional services. Currently the focus is on Data Science, including Big Data, Hadoop, Business Intelligence, Data Warehouse, Data Integration and Visualization. From time to time I will blog about other things of interest. The opinions expressed in this blog are entirely my own and should not be taken as the opinion of my employer.Blogroll
Tag Archives: hadoop
Basic HBase Java Classes and Methods – Part 3: Table Creation
We will cover these basic steps: Instantiating a configuration object Establishing a connection to HBase Manipulating tables using an administration object Manipulating data within a table using a table instance Creating our table I am using Maven, and below is … Continue reading
Basic HBase Java Classes and Methods – Part 2: HBase Shell
For the purpose of these exercises we will be working with a basic table which as two column families. The first column family is “personal” and will contain first_name, last_name, age, gender, martial_status. The second column family is “professional” and … Continue reading
Basic HBase Java Classes and Methods – Part 1: Getting Started
This series of articles is to familiarize you with basic HBase Java classes and methods. The goal of these articles is not for HBase best practices. In fact, we will be making many compromises as we deploy on what is … Continue reading
Downgrading Apache Hadoop YARN to MapReduce v1
This post is somewhat dated material. Several years back, when YARN was first making headways and vendors starting adopting it as part of Hadoop 2.x, there were many times where I needed to downgrade to MapReduce v1. I had written … Continue reading
Hadoop – Downgrading from YARN to MRv1 (Cloudera CDH4)
With two versions of MapReduce available for Hadoop, the older MRv1 and the newer YARN, sometimes you need to move between the two. Using RPM’s or other packages with the Cloudera CDH installation makes this mostly easy, however there is … Continue reading