I recently answered the following question on StackOverflow How do I plot the decision boundary of a regression using matplotlib?

I am just going to link here to the post, and post the picture below.

LR-boundary

If your trying to install packages using pip and getting errors that complain about ‘egg_info’, it is likely because distribute was merged into setuptools as of version 0.7  so you need to upgrade.  

Here is what I was getting when trying to install gensim:

 bash |  copy code |? 
01
nettles:Project bfeeny$ pip install gensim
02
Downloading/unpacking gensim
03
  Downloading gensim-0.8.8.tar.gz (2.8MB): 2.8MB downloaded
04
  Running setup.py egg_info for package gensim
05
    /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'extras_require'
06
      warnings.warn(msg)
07
    /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'include_package_data'
08
      warnings.warn(msg)
09
    /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
10
      warnings.warn(msg)
11
    /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
12
      warnings.warn(msg)
13
    /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'test_suite'
14
      warnings.warn(msg)
15
    usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
16
       or: -c --help [cmd1 cmd2 ...]
17
       or: -c --help-commands
18
       or: -c cmd --help
19
 
20
    error: invalid command 'egg_info'
21
    Complete output from command python setup.py egg_info:
22
    /Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'extras_require'
23
 
24
  warnings.warn(msg)
25
 
26
/Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'include_package_data'
27
 
28
  warnings.warn(msg)
29
 
30
/Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'zip_safe'
31
 
32
  warnings.warn(msg)
33
 
34
/Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
35
 
36
  warnings.warn(msg)
37
 
38
/Users/bfeeny/anaconda/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'test_suite'
39
 
40
  warnings.warn(msg)
41
 
42
usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
43
 
44
   or: -c --help [cmd1 cmd2 ...]
45
 
46
   or: -c --help-commands
47
 
48
   or: -c cmd --help
49
 
50
 
51
 
52
error: invalid command 'egg_info'
53
 
54
----------------------------------------
55
Cleaning up...
56
Command python setup.py egg_info failed with error code 1 in /private/var/folders/0j/2kbjg_ys7m35z57lw83rdh0w0000gn/T/pip_build_bfeeny/gensim
57
Storing complete log in /Users/bfeeny/.pip/pip.log
58

 

Here is how you can fix it by upgrading or installing setuptools:

 bash |  copy code |? 
01
nettles:Project bfeeny$ pip install --upgrade setuptools
02
Downloading/unpacking setuptools from https://pypi.python.org/packages/source/s/setuptools/setuptools-1.4.tar.gz#md5=5710464bc5a61d75f5087f15ce63cfe0
03
  Downloading setuptools-1.4.tar.gz (793kB): 793kB downloaded
04
  Running setup.py egg_info for package setuptools
05
 
06
Installing collected packages: setuptools
07
  Found existing installation: setuptools 0.6c11
08
    Uninstalling setuptools:
09
      Successfully uninstalled setuptools
10
  Running setup.py install for setuptools
11
 
12
    Installing easy_install script to /Users/bfeeny/anaconda/bin
13
    Installing easy_install-2.7 script to /Users/bfeeny/anaconda/bin
14
Successfully installed setuptools
15
Cleaning up...
16

Now we can cleanly install gensim:

 bash |  copy code |? 
01
nettles:Project bfeeny$ pip install gensim
02
Downloading/unpacking gensim
03
  Downloading gensim-0.8.8.tar.gz (2.8MB): 2.8MB downloaded
04
  Running setup.py egg_info for package gensim
05
 
06
    warning: no files found matching '*.sh' under directory '.'
07
    no previously-included directories found matching 'docs/src*'
08
Requirement already satisfied (use --upgrade to upgrade): scipy>=0.7.0 in /Users/bfeeny/anaconda/lib/python2.7/site-packages (from gensim)
09
Installing collected packages: gensim
10
  Running setup.py install for gensim
11
 
12
    warning: no files found matching '*.sh' under directory '.'
13
    no previously-included directories found matching 'docs/src*'
14
Successfully installed gensim
15
Cleaning up...
16

One of the greatest things that has led to a more healthy society, is the creation of new medical tests to help clinicians detect and diagnose conditions.  As with any type of test, there is error.  And in a medical test, particularly those which are testing for serious conditions, it is very important that the test errors on the side of a false positive vs. a false negative, that they are designed to minimize type II errors.  For this reason, it is very possible that you goto the doctor, receive a test, you test positive, but in fact you really are not positive.  Just how likely is it that you are not positive? Well, this of course depends on the test, but here is an example.

In Africa, there are many areas where the prevalence of HIV is .5% .  In fact, there are many areas in Africa and other parts of the world that its much worse than that.  But let’s say you are in this general population in Africa where the prevalence of HIV is .5%.

Let’s also assume we have a test, which can detect HIV 95% of the time that someone actually has HIV. Bayes formula shows us just how likely it is the person actually may have HIV:

p(hiv) = 1/200 – the probability of someone having HIV

p(pos | hiv) = 0.95 – the probability the system will give the positive result if someone has HIV
        ___
p(pos | hiv ) = 0.05 – the probability the system will give the positive result if someone does not have HIV

p(hiv | pos) = the probability that a person has HIV if the system gives the  positive result

 

bayes

 

 

 

 

 

Of the people who test positive, the percent of them we actually expect to have the HIV virus is .0872 (9% if taken to two decimal places).  Please note, this is just a toy example of a test that will show positive 95% of the time if someone has a condition.  

This is why additional testing can be so important.

Boxplots in R can be a bit tricky (ugly actually), but here is an example below to help.  Below is the binomial distribution for p = 0.3, p = 0.5 and p = 0.7 with total number of trials n = 60 as a function of k successful trials.  Shown is the typical five statistics of 1st Quartile, median, mean, standard deviation and 3rd quartile.  The probability is on the horizontal axis.

 

 rsplus |  copy code |? 
01
# Setup window to show three graphs side by side par(mfrow = c(1, 3))  
02
# Create a boxplot using rbinom(), displaying values for 1st Quartile, 
03
# mean, median, standard deviation, 3rd Quartile.  The mean is denoted 
04
# using pch=8 (*) on the plot.  Standard deviation is noted in the xlab. 
05
06
bp1 <- boxplot(rbinom(60, 60, 0.3), col = "red", main = ("Binomial Distribution k=60 p=.3"),      xlab = c(paste("p=0.3, mean", 60 * 0.3, sep = "="), paste("sd", sprintf("%0.2f", sqrt(60 * 0.3 * 0.7)), sep = "=")), ylab = "Value") 
07
points(x = 1, y = 60 * 0.3, pch = 8) 
08
text(1, 0.5 + 60 * 0.3, labels = 60 * 0.3) 
09
text(1.3, bp1$stats, labels = bp1$stats)  
10
11
bp2 <- boxplot(rbinom(60, 60, 0.5), col = "blue", main = ("Binomial Distribution of k=60 p=.5"), xlab = c(paste("p=0.5, mean", 60 * 0.5, sep = "="), paste("sd", sprintf("%0.2f",        sqrt(60 * 0.5 * 0.5)), sep = "=")), ylab = "Value") 
12
points(x = 1, y = 60 * 0.5, pch = 8) 
13
text(1, 0.5 + 60 * 0.5, labels = 60 * 0.5) 
14
text(1.3, bp2$stats, labels = bp2$stats)  
15
16
bp3 <- boxplot(rbinom(60, 60, 0.7), col = "green", main = ("Binomial Distribution of k=60 p=.7"),  xlab = c(paste("p=0.7, mean", 60 * 0.7, sep = "="), paste("sd", sprintf("%0.2f",          sqrt(60 * 0.7 * 0.3)), sep = "=")), ylab = "Value") 
17
points(x = 1, y = 60 * 0.7, pch = 8) 
18
text(1, 0.5 + 60 * 0.7, labels = 60 * 0.7) 
19
text(1.3, bp3$stats, labels = bp3$stats)
20

 

boxplots

Below is a graph I made using NetworkX, with data from govtrack.us.  The layout was done using GraphViz.  A Force Atlas model in Gephi was used as well as edge weight filtering to produce the graph.  The node size is determined based on each senators Google PageRank score.  The graph depicts congruency of voting patterns for all bills through November 5th, 2013 for the 113th congress.  

p10