February 10, 2017 . .
Even though Donald Trump won the election, has been sworn in and has been causing complete chaos for weeks now, his tweets keep getting more negative. Perhaps being president isn't that much fun.
r statistics text tidytext

January 23, 2017 . .
The `tidyverse` is so dedicated to tidy data that even the error output is a tidy data frame.
r statistics quick

November 17, 2016 . .
Using the interocular trauma test (if it is there, it will hit you between the eyes) to quickly assess claims with a large signaltonoise ratio.
r statistics quick

April 23, 2015 . .
One of the major problems in observational research is estimating the true treatment effect. This is not hard when the selection and outcome processes are uncorrelated and all relevant variables are observed and properly controlled for. However, when the selection and outcome are correlated and it is not possible to remove this correlation on the basis of the observables, biased estimation results. The Heckman selection model affords one way of dealing with and minimizing this introduced bias. A parallel R based simulation of a Heckman style estimator compared to least squares and propensity scores highlights the potential utility of this framework.
statistics r methods heckman

April 15, 2015 . .
Propensity scores are increasingly in vogue as a way to adjust for differences between populations in estimating treatment effects. Some view propensity scores as an almost mythical way of dealing with confounding. However, they are limited to adjustment for the observables, just like standard regression. So it raises the question "how do propensity scores compare as an estimator relative to linear regression?" The answer is short  "not well."
statistics r regression propensityscores

March 21, 2014 . .
When you go to war, it can be useful to know how many tanks the other side has. However, they often refuse to tell you. Worse even, they will often vastly inflate production numbers. They are at war, after all. If only there was a way to convert that pesky sequential serial number to an estimate of the total number of tanks...
statistics ggplot r frequentist

March 20, 2014 . .
You need to come up with a regression model for some response. You have tons of predictor variables that you might want to consider. How do you decide what variables to consider in your model? If you started with bivariate correlations of the response and each predictor, you may be in for some trouble.
statistics r modelSelection regression

March 14, 2014 . .
Whether you have lost a sub, hydrogen bomb or just your keys, Bayesian search theory can help find it!
statistics ggplot r bayesian

January 10, 2014 . .
Instrumental variables provide a power method for getting around unobserved heterogeneity and are increasingly popular in observational research. By exploiting a third variable, known as an instrumental variable, this method breaks the correlation between an independent variable and theomitted or unobserved variables. However, the definitions are mind boggling and the process is often unclear, even when advertised as "Mostly Harmless." In cases like this, a simulation is often handy, especially ones written in R.
statistics ggplot r shiny

November 20, 2013 . .
If I told you I saw bigfoot, would you believe me? Could I present any evidence that would change your mind? Probably (hopefully) not. The likelihood of bigfoot being real is so small that the only people who report seeing one are also the same people who "believe" Ancient Aliens belongs on the History Channel. But if you think that way and penalize an extraordinary claim, why do we use statistical inference that doesn't do the same thing? Could this be the cause of the the rash of high profile failures to reproduce studies and the "decline effect"?
statistics ggplot r shiny

August 29, 2013 . .
Everyone loves TV and hates it when their favorite show gets canceled. That is why they refuse to watch shows on Fox or watch anything that is airing Friday evening. But do these commonly held beliefs hold true? Does Fox hate TV and is Friday night a graveyard for scripted TV?
statistics regression ggplot r tv fun

February 21, 2013 . .
Over the summer I had some problems with my Internet connection, specifically very high latency. It was causing problems whenever I tried to do something particularly sensitive to latency issues. I am already bad enough at online games, having massive lag was not helping. But, like nearly all connection issues, it wasn't happening all the time or even to all the packets at the times when it was acting up. I called my ISP but kept "passing" the ping tests on their end and they said nothing was wrong and that my 100300 ms pings to multiple sites was something I was dreaming up. To get the problem fixed, I wrote a little Python script to grab the pings and a second R script to check out what was going on.
internet Python R applied statistics

February 1, 2013 . .
Shortly after making yesterday's post, I saw a visualization of apartment rental prices in Boston. As is commonly known, the three most important things in real estate are location, location, location. But location can go either way for prices, where is the good area and living on Yucca Mountain. How can we figure out which locations are good or bad without knowing anything else about Boston? We use the same method discussed in yesterday's simulation but on good old fashioned real data.
dataMining EM R statistics regression applied

January 31, 2013 . .
Observational studies have the same problem as poker, you have to play the cards you are dealt. This can be a problem when you expect people to responder differently to some variable according to some number of unobserved variables. While expectationmaximization probably won't help you in your weekly Texas Hold'em game, it can be an ace up your sleeve in data analysis.
dataMining EM R statistics regression simulation
