One of my favorite data sources is the NYC OpenData site. A while back I noticed a really interesting data set on that site, and I’ve been mulling over what exactly to do with it for a while. The data set in question is the 2015 Street Tree Census. I knew there was some interesting question that this data could answer, I just had to think of it (or ask my girlfriend to think of it for me.

In this blog post I hope to introduce you to the powerful and simple Metropolis-Hastings algorithm. This is a common algorithm for generating samples from a complicated distribution using Markov chain Monte Carlo, or MCMC.
By way of motivation, remember that Bayes' theorem says that given a prior \(\pi(\theta)\) and a likelihood that depends on the data, \(f(\theta | x)\), we can calculate
$$
\pi(\theta | x) = \frac{f(\theta | x) \pi(\theta)}{\int f(\theta | x) \pi(\theta) \; \mathrm{d}\theta}.
$$
We call \(\pi(\theta | x)\) the posterior distribution. It's this distribution that we use to make estimates and inferences about the parameter \(\theta\). Often, however, this distribution is intractable; it can't be calculated directly. This usually happens because of a nasty integral in the denominator that isn't easily solvable. In cases like this, the best solution is usually to approximate the posterior using a Monte Carlo method.

(This is a companion to my post on Paul Gronke’s earlyvoting.net)
One of the first assignments we had in my Election Sciences course was to take a look at registration data from the Oregon Motor Voter program and try to find interesting patterns. For those who don’t know, Oregon Motor Voter is an automatic voter registration program in Oregon. Whenever someone interacts with the Oregon DMV, their voter eligibility is automatically checked, and if they are eligible to vote but not registered, they are automatically added to the rolls.

Looking at the spread of a measles epidemic using social networks.