Posts Tagged “r”

Poor Donald - his tweets keep getting more negative

February 10, 2017 . .
Even though Donald Trump won the election, has been sworn in and has been causing complete chaos for weeks now, his tweets keep getting more negative. Perhaps being president isn't that much fun.
r statistics text tidytext

readr::problems() returns tidy data!

January 23, 2017 . .
The `tidyverse` is so dedicated to tidy data that even the error output is a tidy data frame.
r statistics quick

Inter-ocular trauma test

November 17, 2016 . .
Using the inter-ocular trauma test (if it is there, it will hit you between the eyes) to quickly assess claims with a large signal-to-noise ratio.
r statistics quick

Using tidytext to make sentiment analysis easy

November 15, 2016 . .
I recently discovered the R package `tidytext` and fell in love with it. It combines the "tidy" ecosystem, which I'm very familar and comfortable in, with natural language processing (something that has been more challenging for me, not least of all because it rarely is tidy). I loved playing the package and modeling how my language and sentiments varied in my thesis. The heavy use of Jane Austen in the `tidytext` examples certainly didn't hurt either.
r tidytext tidyverse text sentiment

Easy Cross Validation in R with `modelr`

November 11, 2016 . .
Cross-validation is a useful approach for estimating out-of-sample error. The `modelr` package has made estimating models with cross-validation much easier, and the `resample` object type is vital for cross-validation with large data in R.
r modelr tidyverse cross_validation ml

Parallel Simulation of Heckman Selection Model

April 23, 2015 . .
One of the major problems in observational research is estimating the true treatment effect. This is not hard when the selection and outcome processes are uncorrelated and all relevant variables are observed and properly controlled for. However, when the selection and outcome are correlated and it is not possible to remove this correlation on the basis of the observables, biased estimation results. The Heckman selection model affords one way of dealing with and minimizing this introduced bias. A parallel R based simulation of a Heckman style estimator compared to least squares and propensity scores highlights the potential utility of this framework.
statistics r methods heckman

The Problem with Propensity Scores

April 15, 2015 . .
Propensity scores are increasingly in vogue as a way to adjust for differences between populations in estimating treatment effects. Some view propensity scores as an almost mythical way of dealing with confounding. However, they are limited to adjustment for the observables, just like standard regression. So it raises the question "how do propensity scores compare as an estimator relative to linear regression?" The answer is short --- "not well."
statistics r regression propensity-scores

Frequentist German Tank Problem

March 21, 2014 . .
When you go to war, it can be useful to know how many tanks the other side has. However, they often refuse to tell you. Worse even, they will often vastly inflate production numbers. They are at war, after all. If only there was a way to convert that pesky sequential serial number to an estimate of the total number of tanks...
statistics ggplot r frequentist

Stop using bivariate correlations for variable selection

March 20, 2014 . .
You need to come up with a regression model for some response. You have tons of predictor variables that you might want to consider. How do you decide what variables to consider in your model? If you started with bivariate correlations of the response and each predictor, you may be in for some trouble.
statistics r modelSelection regression

Bayesian Search Models

March 14, 2014 . .
Whether you have lost a sub, hydrogen bomb or just your keys, Bayesian search theory can help find it!
statistics ggplot r bayesian

Instrumental Variables Simulation

January 10, 2014 . .
Instrumental variables provide a power method for getting around unobserved heterogeneity and are increasingly popular in observational research. By exploiting a third variable, known as an instrumental variable, this method breaks the correlation between an independent variable and theomitted or unobserved variables. However, the definitions are mind boggling and the process is often unclear, even when advertised as "Mostly Harmless." In cases like this, a simulation is often handy, especially ones written in R.
statistics ggplot r shiny

Penalizing P Values

November 20, 2013 . .
If I told you I saw bigfoot, would you believe me? Could I present any evidence that would change your mind? Probably (hopefully) not. The likelihood of bigfoot being real is so small that the only people who report seeing one are also the same people who "believe" Ancient Aliens belongs on the History Channel. But if you think that way and penalize an extraordinary claim, why do we use statistical inference that doesn't do the same thing? Could this be the cause of the the rash of high profile failures to reproduce studies and the "decline effect"?
statistics ggplot r shiny

TV Ratings Myths

August 29, 2013 . .
Everyone loves TV and hates it when their favorite show gets canceled. That is why they refuse to watch shows on Fox or watch anything that is airing Friday evening. But do these commonly held beliefs hold true? Does Fox hate TV and is Friday night a graveyard for scripted TV?
statistics regression ggplot r tv fun