Thursday, November 8, 2012

Type I and Type II errors

A recent news item (see here for the BBC report) covered research into the UK breast cancer screening.  The problem with the screening is that it is not perfect: it can miss some genuine tumours but it can also 'over-diagnose', i.e. signal a tumour which is actually harmless.

These should be familiar as Type I and II errors.  The null hypothesis is the absence of a tumour, so a Type I error is that of incorrectly diagnosing a tumour, when there isn't one.  A Type II error is missing a genuine tumour.  (One could look at this the other way round, with the null being the presence of a tumour, etc.)

According to the report, for every life saved, three women had unnecessary treatment for cancer.  This seems quite a high ratio but partly reflects the fact that the incidence of cancer is actually quite low.  The probability of a Type I error is given as 1% in the article.  This would be consistent with something like the following: for every thousand women tested, 10 are incorrectly diagnosed and treated, while three are correctly diagnosed and treated (hence approximately three times as many false positives as genuine ones.)

As well as the probabilities, the costs of the errors should also be taken into account.  The cost of missing a diagnosis is apparent to us, which is why there is a national system of screening.  The costs of over-diagnosis are less obvious but can be substantial.  The treatment is unpleasant, to say the least.  The costs of over-diagnosis might also be masked because it is concluded that the treatment has worked, rather than that there never was a cancer.

Election polls and odds

The recent US election provides some interesting opportunities to look at the opinion polls.  One of the most accurate turned out to be that of Nate Silver of the New York Times who gathered up all the opinion poll data and turned it into a prediction of victory.

Many journalists and opinion-formers in the US were saying the election was 'too close to call' even on the eve of the election itself.  But this seems to confuse two quite different possibilities:

1. Strong evidence of a narrow win for Obama
2. No evidence of a strong win for either side

Many commentators went with 2 above, but 1 is the correct interpretation.  Let's see how this works.

Silver gives evidence for Colorado, one of the 'tipping point states' that could be decisive in the election.  Based on the various polls, Silver projected vote shares of 50.8% for Obama, 48.3% for Romney.  On this basis it looks fairly close, and this is probably how the commentators viewed it (especially as there is a +/-3% points margin of error on the polls).  However, Silver also gives the projected probabilities of winning, which are 80% Obama, 20% Romney.  This looks much more decisive.  How do we get from the poll figures to the odds of 80:20?

If we take the margin of error as representing two standard errors, as is usual, then the standard error is 1.5%.  Disregarding the 0.9% of voters not supporting either candidate, we have p = 50.8/99.1 = 51.26% for Obama and hence 48.7% for Romney.

We then ask, is it likely that Obama's true share of the vote is less than 50%?  This is a question using a sample proportion so we calculate the z-score as:

z = (0.5126 - 0.5)/1.5 = 0.84

This cuts off 20.05% in the tail of the distribution.  This tells us that there is a 20% chance of getting such evidence (sample proportion of 51.26% or more) if Obama's true vote share is 50% or less.  Hence there is a 20% chance of a Romney victory, 80% for Obama.

This is my own take on the evidence, Silver's procedure is probably more sophisticated, but our approximation seems to work.  (You could try it out on other states to see if you too can replicate it.  Here's Virginia, another tipping point state:  Obama 50.7, Romney 48.7, margin of error +/-2.5.  Silver's odds for this are 79:21 for Obama.)

Note also that +/-3% (points) is a typical margin of error for polls.  Recall that the standard error of a proportion is the square root of p(1-p)/n.  If p is approximately 50% and n is about 1000 (a typical poll) then the formula gives a standard error of 1.58.  Doubling this gives our margin of error.