Learning R Online with Professor Andrew Conway of Princeton
ITunes-U, Coursera, Khan Academy, MITx, and so many more. Most of these classes are taught by full University Professors, complete with homework, quizzes, midterms, finals and books that you either purchase or in some cases are available free online. I was recently drawn to one such course taught by Professor Andrew Conway of Princeton and offered through Coursera. This was the inauguralsemester for Statistics One. Statistics is like the new language of our time. If you don't understand it, you can hardly comprehend the mix of propaganda, "facts", and information overload being twisted and perverted by the media and others. More importantly, if you don't understand it, you can't make use of all the data being gathered and available out there. The approach of Statistics One was very non-traditional as far as Statistics courses go. Professor Conway is a Psychology professor. His approach to teaching Statistics is very much an applied approach. He does take the time to explain all of the key details of Statistics such as the Central Limit Theory, but does so in a way that is much more practical than mathematical. This allows much more time to go over the "why" behind why you would want to use the many methods presented as opposed to the "how". In today's world, there is very little reason to work out a full regression analysis of a data frame by hand. Everything from Microsoft Excel, Minitab, SAS, R and more is available to do most of this for you. In Statistics One, the weapon of choice was R, a full featured, open-source, free statistical package. The class made heavy use of R. Many people who took the class who did not have a programming background found themselves on unfamiliar territory. Fortunately for me, in addition to having a substantial programming background, I also have used R before in doing elementary data analysis. I chose to use R Studio, which is a more feature rich IDE for the R programming language. Here is an example of an R program: # Statistics One, Lecture 6, example script# Read data, plot histograms, get descriptives, examine scatterplots, run correlations
library(psych)
# Read the data into a dataframe called impact
impact <- read.table("Desktop/STATS1.EX.02.TXT", header=T)
# What type of object is impact?
class(impact)
# List the names of the variables in the dataframe called impact
names(impact)
# Change default settings for graphics
par(cex = 2, lwd = 2, col.axis = 200, col.lab = 200, col.main = 200, col.sub = 200, fg = 200)
# Plot histograms
hist(impact$memory.verbal, xlab = "Verbal memory", main = "Histogram", col = "red")
hist(impact$memory.visual, xlab = "Visual memory", main = "Histogram", col = "red")
hist(impact$speed.vismotor, xlab = "Visual-motor speed", main = "Histogram", col = "red")
hist(impact$speed.general, xlab = "General speed", main = "Histogram", col = "red")
hist(impact$memory.control, xlab = "Impulse control", main = "Histogram", col = "red")
# Descriptive statistics for the variables in the dataframe called impact
describe(impact)
# Scatterplots (one pair at a time)
plot(impact$memory.verbal ~ impact$memory.visual, main = "Scatterplot", ylab = "Verbal memory", xlab = "Visual memory")
abline(lm(impact$memory.verbal ~ impact$memory.visual), col = "blue")
cor(impact$memory.verbal, impact$memory.visual)
# Correlations (one pair at a time)
cor.test(impact$memory.verbal, impact$memory.visual)
# Correlations (all in a matrix)
cor(impact)
# Scatterplot matrix
library(gclus)
impact.r = abs(cor(impact))
impact.col = dmat.color(impact.r)
impact.o <- o
cpairs(impact, impact.o, panel.colors = impact.col, gap = .5,rder.single(impact.r)
main = "Variables Ordered and Colored by Correlation")
This program gives you insight into the kinds of programs you will learn to write in Statistics One.
I took this course myself because I have a passion for Data Analysis. Next semester I will be in the Big Data Analytics course at Harvard, which makes use of R, amongst other tools such as Mahout, Hadoop, Pig and Hive.
The amount of activity that goes on in an online course such as Statistics One is mind boggling and powerful. We had interactive chat rooms and forums, with many participants. Getting answers to questions and interacting with others about the material was among the best I have seen in online classes.
Some of the concepts taught in Statistics One included:
Correlation and Measurement
Mediation andModeration
Regression and Hypothesis Testing
Student's t-test and Analysis of Variance (ANOVA)
Factoral ANOVA and Model Comparisons
I would highly recommend this class to just about anyone wanting to learn about Statistics. I think it is also a good idea to take it if you plan to take Statistics in college. Why not take this class first, for free? In fact, I may even take it a second time, although probably not any time soon. The reality is, it's very fast paced, clocking in at about 7 weeks. I would have preferred a more traditional 15 week class format, but the reality is they are working out any kinks in this class and its possible they may adjust some of the material or even drop some of it so they can focus more on what is more important. Feedback is solicited by the Professor and his staff and its great to take part in shaping a class such as this.
Coursera is a literally a treasure trove for the Data Scientist. Other classes available for free, once again from prominent university professors are: Computing for Data Analysis, Social Network Analysis, Neural Networks for Machine Learning, Data Analysis, Introduction to Data Science, and so much more.
Recent Posts
See AllRecently I was working on a problem with Time Series. Time Series can quickly add up to a lot of data, as you are using previous...
One of the biggest bottlenecks in Deep Learning is loading data. having fast drives and access to the data is important, especially if...
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.TableName;...
Comentarios