Category Archives: Data Analytics

Downgrading Apache Hadoop YARN to MapReduce v1

This post is somewhat dated material.  Several years back, when YARN was first making headways and vendors starting adopting it as part of Hadoop 2.x, there were many times where I needed to downgrade to MapReduce v1.  I had written … Continue reading

Posted in Data Analytics | Tagged | Leave a comment

Scaling data for Deep Learning

When building deep learning models, it can be very beneficial to scale your data.  Oftentimes data can have a huge range of unbounded values.  The goal of scaling is to bound these values.  Typically the activation functions of a neuron … Continue reading

Posted in Data Analytics | Tagged | Leave a comment

How to plot decision boundaries for Logistic Regression in Matplotlib

I recently answered the following question on StackOverflow How do I plot the decision boundary of a regression using matplotlib? I am just going to link here to the post, and post the picture below.

Posted in Data Analytics | Tagged , , | Leave a comment

pip: error: invalid command ‘egg_info’

If your trying to install packages using pip and getting errors that complain about ‘egg_info’, it is likely because distribute was merged into setuptools as of version 0.7  so you need to upgrade.   Here is what I was getting … Continue reading

Posted in Data Analytics | Tagged , , , | Leave a comment

Medical testing and statistics, why testing positive may mean you “probably” are not

One of the greatest things that has led to a more healthy society, is the creation of new medical tests to help clinicians detect and diagnose conditions.  As with any type of test, there is error.  And in a medical … Continue reading

Posted in Data Analytics | Leave a comment