I recently answered the following question on StackOverflow How do I plot the decision boundary of a regression using matplotlib?
I am just going to link here to the post, and post the picture below.

If your trying to install packages using pip and getting errors that complain about ‘egg_info’, it is likely because distribute was merged into setuptools as of version 0.7 so you need to upgrade.
Here is what I was getting when trying to install gensim:
| 01 | nettles:Project bfeeny$ pip install gensim
|
| 02 | Downloading/unpacking gensim
|
| 03 | Downloading gensim-0.8.8.tar.gz (2.8MB): 2.8MB downloaded
|
| 04 | Running setup.py egg_info for package gensim
|
| 05 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'extras_require'
|
| 06 | warnings.warn(msg)
|
| 07 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'include_package_data'
|
| 08 | warnings.warn(msg)
|
| 09 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
|
| 10 | warnings.warn(msg)
|
| 11 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
|
| 12 | warnings.warn(msg)
|
| 13 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'test_suite'
|
| 14 | warnings.warn(msg)
|
| 15 | usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
|
| 16 | or: -c --help [cmd1 cmd2 ...]
|
| 17 | or: -c --help-commands
|
| 18 | or: -c cmd --help
|
| 19 | |
| 20 | error: invalid command 'egg_info'
|
| 21 | Complete output from command python setup.py egg_info:
|
| 22 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'extras_require'
|
| 23 | |
| 24 | warnings.warn(msg)
|
| 25 | |
| 26 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'include_package_data'
|
| 27 | |
| 28 | warnings.warn(msg)
|
| 29 | |
| 30 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
|
| 31 | |
| 32 | warnings.warn(msg)
|
| 33 | |
| 34 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
|
| 35 | |
| 36 | warnings.warn(msg)
|
| 37 | |
| 38 | /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'test_suite'
|
| 39 | |
| 40 | warnings.warn(msg)
|
| 41 | |
| 42 | usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
|
| 43 | |
| 44 | or: -c --help [cmd1 cmd2 ...]
|
| 45 | |
| 46 | or: -c --help-commands
|
| 47 | |
| 48 | or: -c cmd --help
|
| 49 | |
| 50 | |
| 51 | |
| 52 | error: invalid command 'egg_info'
|
| 53 | |
| 54 | ----------------------------------------
|
| 55 | Cleaning up...
|
| 56 | Command python setup.py egg_info failed with error code 1 in /private/var/folders/0j/2kbjg_ys7m35z57lw83rdh0w0000gn/T/pip_build_bfeeny/gensim
|
| 57 | Storing complete log in /Users/bfeeny/.pip/pip.log
|
| 58 | |
Here is how you can fix it by upgrading or installing setuptools:
| 01 | nettles:Project bfeeny$ pip install --upgrade setuptools
|
| 02 | Downloading/unpacking setuptools from https://pypi.python.org/packages/source/s/setuptools/setuptools-1.4.tar.gz#md5=5710464bc5a61d75f5087f15ce63cfe0
|
| 03 | Downloading setuptools-1.4.tar.gz (793kB): 793kB downloaded
|
| 04 | Running setup.py egg_info for package setuptools
|
| 05 | |
| 06 | Installing collected packages: setuptools
|
| 07 | Found existing installation: setuptools 0.6c11
|
| 08 | Uninstalling setuptools:
|
| 09 | Successfully uninstalled setuptools
|
| 10 | Running setup.py install for setuptools
|
| 11 | |
| 12 | Installing easy_install script to /Users/bfeeny/anaconda/bin
|
| 13 | Installing easy_install-2.7 script to /Users/bfeeny/anaconda/bin
|
| 14 | Successfully installed setuptools
|
| 15 | Cleaning up...
|
| 16 | |
Now we can cleanly install gensim:
| 01 | nettles:Project bfeeny$ pip install gensim
|
| 02 | Downloading/unpacking gensim
|
| 03 | Downloading gensim-0.8.8.tar.gz (2.8MB): 2.8MB downloaded
|
| 04 | Running setup.py egg_info for package gensim
|
| 05 | |
| 06 | warning: no files found matching '*.sh' under directory '.'
|
| 07 | no previously-included directories found matching 'docs/src*'
|
| 08 | Requirement already satisfied (use --upgrade to upgrade): scipy>=0.7.0 in /Users/bfeeny/anaconda/lib/python2.7/site-packages (from gensim)
|
| 09 | Installing collected packages: gensim
|
| 10 | Running setup.py install for gensim
|
| 11 | |
| 12 | warning: no files found matching '*.sh' under directory '.'
|
| 13 | no previously-included directories found matching 'docs/src*'
|
| 14 | Successfully installed gensim
|
| 15 | Cleaning up...
|
| 16 | |
One of the greatest things that has led to a more healthy society, is the creation of new medical tests to help clinicians detect and diagnose conditions. As with any type of test, there is error. And in a medical test, particularly those which are testing for serious conditions, it is very important that the test errors on the side of a false positive vs. a false negative, that they are designed to minimize type II errors. For this reason, it is very possible that you goto the doctor, receive a test, you test positive, but in fact you really are not positive. Just how likely is it that you are not positive? Well, this of course depends on the test, but here is an example.
In Africa, there are many areas where the prevalence of HIV is .5% . In fact, there are many areas in Africa and other parts of the world that its much worse than that. But let’s say you are in this general population in Africa where the prevalence of HIV is .5%.
Let’s also assume we have a test, which can detect HIV 95% of the time that someone actually has HIV. Bayes formula shows us just how likely it is the person actually may have HIV:
p(hiv) = 1/200 – the probability of someone having HIV
p(pos | hiv) = 0.95 – the probability the system will give the positive result if someone has HIV
___
p(pos | hiv ) = 0.05 – the probability the system will give the positive result if someone does not have HIV
p(hiv | pos) = the probability that a person has HIV if the system gives the positive result

Of the people who test positive, the percent of them we actually expect to have the HIV virus is .0872 (9% if taken to two decimal places). Please note, this is just a toy example of a test that will show positive 95% of the time if someone has a condition.
This is why additional testing can be so important.
Boxplots in R can be a bit tricky (ugly actually), but here is an example below to help. Below is the binomial distribution for p = 0.3, p = 0.5 and p = 0.7 with total number of trials n = 60 as a function of k successful trials. Shown is the typical five statistics of 1st Quartile, median, mean, standard deviation and 3rd quartile. The probability is on the horizontal axis.
| 01 | # Setup window to show three graphs side by side par(mfrow = c(1, 3))
|
| 02 | # Create a boxplot using rbinom(), displaying values for 1st Quartile,
|
| 03 | # mean, median, standard deviation, 3rd Quartile. The mean is denoted
|
| 04 | # using pch=8 (*) on the plot. Standard deviation is noted in the xlab.
|
| 05 |
|
| 06 | bp1 <- boxplot(rbinom(60, 60, 0.3), col = "red", main = ("Binomial Distribution k=60 p=.3"), xlab = c(paste("p=0.3, mean", 60 * 0.3, sep = "="), paste("sd", sprintf("%0.2f", sqrt(60 * 0.3 * 0.7)), sep = "=")), ylab = "Value")
|
| 07 | points(x = 1, y = 60 * 0.3, pch = 8)
|
| 08 | text(1, 0.5 + 60 * 0.3, labels = 60 * 0.3)
|
| 09 | text(1.3, bp1$stats, labels = bp1$stats)
|
| 10 |
|
| 11 | bp2 <- boxplot(rbinom(60, 60, 0.5), col = "blue", main = ("Binomial Distribution of k=60 p=.5"), xlab = c(paste("p=0.5, mean", 60 * 0.5, sep = "="), paste("sd", sprintf("%0.2f", sqrt(60 * 0.5 * 0.5)), sep = "=")), ylab = "Value")
|
| 12 | points(x = 1, y = 60 * 0.5, pch = 8)
|
| 13 | text(1, 0.5 + 60 * 0.5, labels = 60 * 0.5)
|
| 14 | text(1.3, bp2$stats, labels = bp2$stats)
|
| 15 |
|
| 16 | bp3 <- boxplot(rbinom(60, 60, 0.7), col = "green", main = ("Binomial Distribution of k=60 p=.7"), xlab = c(paste("p=0.7, mean", 60 * 0.7, sep = "="), paste("sd", sprintf("%0.2f", sqrt(60 * 0.7 * 0.3)), sep = "=")), ylab = "Value")
|
| 17 | points(x = 1, y = 60 * 0.7, pch = 8)
|
| 18 | text(1, 0.5 + 60 * 0.7, labels = 60 * 0.7)
|
| 19 | text(1.3, bp3$stats, labels = bp3$stats)
|
| 20 | |

Below is a graph I made using NetworkX, with data from govtrack.us. The layout was done using GraphViz. A Force Atlas model in Gephi was used as well as edge weight filtering to produce the graph. The node size is determined based on each senators Google PageRank score. The graph depicts congruency of voting patterns for all bills through November 5th, 2013 for the 113th congress.
