About this blogThis blog is mostly about my pursuits in Data Science. Previous blog entries also dealt with storage, compute, virtualization and professional services. Currently the focus is on Data Science, including Big Data, Hadoop, Business Intelligence, Data Warehouse, Data Integration and Visualization. From time to time I will blog about other things of interest. The opinions expressed in this blog are entirely my own and should not be taken as the opinion of my employer.
Category Archives: Data Analytics
This post is somewhat dated material. Several years back, when YARN was first making headways and vendors starting adopting it as part of Hadoop 2.x, there were many times where I needed to downgrade to MapReduce v1. I had written … Continue reading
When building deep learning models, it can be very beneficial to scale your data. Oftentimes data can have a huge range of unbounded values. The goal of scaling is to bound these values. Typically the activation functions of a neuron … Continue reading
I recently answered the following question on StackOverflow How do I plot the decision boundary of a regression using matplotlib? I am just going to link here to the post, and post the picture below.
If your trying to install packages using pip and getting errors that complain about ‘egg_info’, it is likely because distribute was merged into setuptools as of version 0.7 so you need to upgrade. Here is what I was getting … Continue reading
One of the greatest things that has led to a more healthy society, is the creation of new medical tests to help clinicians detect and diagnose conditions. As with any type of test, there is error. And in a medical … Continue reading