Saturday, May 10, 2008

My Challenge to College Math Professors results in a Failing Grade.

If you are a professor of Math at a college, I have a math problem for you to solve. What are the odds that the top 10 and top 11 highest winning percentages for Barack Obama would all come from caucus states? What are the odds that the top 11 caucus wins and zero primary wins are just a coincidence, and how much do the odds reduce if Barack Obama happened to have better organization in all of those caucus states than Hillary Clinton?

The critical data to work with is the following. Through May 10th of 2008 there were 32 primaries and approximately 17 caucuses. What are the odds that Barack Obama's top 11 highest percentages were all from caucus states? I believe the odds of the 11 highest winning percentages all being from caucus states to be in the millions, and possibly even the billions. I believe that at a miniumum 3 primaries should be mixed in among Barack Obama's top 10 or 11 winning percentages for the results to be considered mathematically legitimate.

If you read the comments section you will find additional information that may help you with your calculations. Otherwise you can keep it simple and just go with the data provided above.

I would like a college professor to give me their calcuation results, or they can make it a class project and have their students do it.

Thank you in advance. It's been a over a week since I made my challenge and after an initial attempt seemed to stall, nobody else seems able to provide a mathematical probability.


Anonymous said...

I voted for Hillary, but - what difference does it make? Neither Hillary or Obama invented the Democratic nomination rules... the only thing they (and the other candidates) could do is to (a) know what the rules are, and (b) come up with a campaign strategy that works best for them within the context of those rules.

Anonymous said...

It doesn't make any difference. Regardless of the outcome to the math problem, it won't change the results.

Anonymous said...

I'm sorry if you find the following offensive, but it's better that it be said:

Your questions are flawed but you may not be well educated enough to understand why. For example:

"What are the odds that the top 10 and top 11 highest..."

Are you asking

"Given Obama's winning percentages in all of the states he has won, both caucuses and standard, what are the odds that the top 10-11 of them would be in caucus states?"

or are you asking

"Does the fact that Obama's top percentage victories have come from caucuses indicate that he is the better organized candidate?"

The latter cannot be answered by statistics as it is a causal question; to put it simply (albeit a bit incorrectly) statistics are good for showing correlations and associations, but not for proving causality. The former could be answered very easily if you made the incorrect assumption that the states which held caucuses were (within their general population) equally as likely to have voted for Obama as the other states. Looking at the list of states Obama has won this is clearly untrue.

I'll leave you with a thought about what to do from here:

HRC has the chance, before the end of voting, to redeem herself. If she were to recognize the futility of her current campaign, gracefully state that she had been beaten and rally her troops behind Obama, she'd have a chance at being remembered not as a pariah of history, but as a good person.

Good luck with the rest of the campaign.

Alessandro Machi said...

I am asking both questions specifically to give the benefit of the doubt to Barack Obama's side. For instance, if we discover that the odds of Obama's top 10 and top 11 winning percentage states all being caucus states is in the millions, in the interest of fairness we should try and determine how much lower the odds would be if in fact Obama was the favorite going in to all of those contests. To bend over backwards being fair, we can also factor in the premise that Obama ran a "better" campaign in all of those caucus states as well.

But did the better campaigning also render a true representation of the will of the voters in Barack Obama's high margin caucus state victories, or did the low voter turnout cause the caucus results to give out inaccurate final results? I believe the lower caucus voter turnouts created volatile and inaccurate results.

My estimate is the odds are in the millions that Obama's 11 highest winning percentages would all be from caucus states IF WE ARE TO BELIEVE THE CAUCUS STATES WERE FAIRLY RUN and everybody had fair access to the voting locations. Even if Barack Obama was the favorite in all 11 caucus states the represent his high winning margin, remember that the key point is not that Barack Obama won so many caucuses, it was the large margin of victory and how his top 11 winning percentage victories all came from caucus state votes that is our talking point.

I do think Obama ran the more assertive campaign in the caucus states, and perhaps he was the favorite in most of the caucs states as well. Therefore one can reduce the odds of his 11 highest margin wins all coming from caucus states from the millions, to around 1 in 10,000. The problem is the odds of Barack's top 10 or 11 margins of victory wins coming from caucus states should be less than 10-1 to be statistically sensible and reliable.

Who ran the better campaign in the caucus states must also be balanced with the final results being somewhere in the vestiges of representing both candidates legitimate popularity in those states. For instance, in the primary state of North Carolina, HIllary Clinton was losing by 20-25 points a month or two before the election. Clinton battled hard and cut that lead almost in half. Barack did the same thing in the Pennsylvania Primary and cut Hillary's lead in half. Both situations demonstrate that in general, very popular candidates will have around 60% popularity and the losing candidate will have around 40% in most worst case scenarios, with the gap potentially closing if the "work hard".

In the caucus states, if one wants to argue that Hillary Clinton did not battle hard, lets accept that premise whether it's true or not. But does that justify victory margins of 30%,40%, and 50% and more when I can cite several examples of primary victory states where the margins generally topped out at 61%-39%, with the exception of Obama's three top primary state wins? lol, and even Barack's three top primary wins still DO NOT crack the top ten in terms of largest winning percentages.

The actual margin of victory is generally going to fall anywhere from 1 percent to 20 percent, with a couple of exceptions to that rule. 75-80% of all of Baracks caucus state victories were over that 22% margin of victory threshold, whereas only approximately 20% of Barack's primary wins broke that 22% margin of victory threshold barrier.

Anonymous said...

I guess I can see what you're getting at, but I keep going back to the idea that the whole process (some states have caucuses, some states have primaries, etc.) is known well in advance, and they are what they are - if we don't like the process, then the time to change them is well before the election cycle, not during it. Did the caucuses favor Obama for one or more reasons? It sure seems that way - but there's nothing that can be done about it right now.

I personally hate the current process - it seems unnecessarily complex, unfair, etc. A much simpler system would apportion delegates based on the statewide vote percentages only of a real/traditional primary election - so if you get 55% of the total vote, then you get 55% of the delegates. Simple enough, huh?

Independent063 said...

Hi. I noticed that the replies to your post gave arguments that were not really based on statistics. I am not a math professor, but I am a researcher at the Pd.D. level, so I have had extensive training in statistical methods. Consequently, I feel qualified to take your challenge. If I understood your question correctly, you wanted to confirm your belief that Barack was more likely to have high percentage spread wins from caucus states than from primary states, and you wanted a probability level assigned to this statistic. To answer your question, I used data that is available at the website url:
Specifically, I used the point spreads (representing popular vote totals), the list of winners for each state, and which states were caucus states. I then recoded the actual point spreads into "1" for "high spread" and "0" for "low spread", irrespective of who got these point spreads. Then, for all the states that Obama won, I ran what is called a Pearson Chi Squared test (for differences in group membership for two discrete variables - where one variable was the recoded point spread, and the other variable was whether the state was a caucus state or not). Overall, Obama had high point spreads in 20 of the races that he won, and low point spreads in 7 of them. Please note that these numbers do not include ones where there was a tie, not the Florida and Michigan primaries. So, the relevant question is: out of these 27 races, is there a bias towards winning with a high point spread in a state that had a caucus? The raw data is as follows: for non-caucus states (18 of them), Obama won 5 with a low point spread and 13 with a high point spread; and for caucus states (9 of them), he won 2 with a low point spread and 7 with a high point spread. Running the SPSS crosstabs program on these raw data, which utilizes the chi square test, the results were nonsignificant (p = .756). Please note that for conventional levels of statistical significance to be reached, the probability level would need to have been less than .05 in the best case scenario. CONCLUSION: There is no difference statistically in the number of caucus versus non-caucus states that Barack won with either high or low point spreads. So, basically, all that you have been hearing over the last months about Obama being stronger in the caucus states is just a bunch of hot air coming from the so-called expert political pundits and news commentators. Just one more example, I suppose, of how our minds can trick us into thinking that there are significant differences because of one number being larger than another one when, in fact, the apparent differences disappear when the numbers are properly aggregated.

Not being satisfied with this one particular analysis, however, I decided to delve a little bit deeper. Using the same test, I looked at Hillary's numbers (no caucus, 8 low and 1 high; and caucus 4 low and 1 high), and again found no statistically significant differences (p = .649).

Since looking at data that has been recoded into binary format (i.e., 1 and 0) can decrease the power of a statistical test, I decided next to examine the differences in point spread overall for caucus versus non-caucus states. Again, I found no statistically significant difference in these numbers (non-caucus mean spread was 24.1 and mean spread for caucus was 21.9; p = .706).

Disregarding whether the state was caucus or non-caucus, I then looked at the mean spread for states Clinton won (13.7) versus those Barack won (30.8). Here, the difference was highly significant (p = .002), indicating that the mean spread for all the contests Barack won was more than twice the mean spread for the ones that Hillary won. Since there was no mean difference between caucus and non caucus states in mean point spreads (previous analysis), it is clear that the 2 to 1 point spread difference between Obama and Clinton existed for both caucus and non-caucus states. This clearly refutes the claims from all the so-called experts who blab away on TV.

Then, using the whole sample, I looked at whether there was even a difference in the number of caucus versus non-caucus states that Hillary versus Barack won (non caucus, Hillary 9, Obama 20; caucus, Hillary 5, Obama 10). Again there was no statistically significant difference in this distribution either (p = .877), lending more support to the notion that the TV jabber jaws know not of what they speak.

Finally, I ran two tests that looked at whether there was a meaningful difference in point spreads for caucus versus non-caucus states separately for Hillary and Barack. For Hillary, the mean point spreads were 14.8 for non-caucus and 11.8 caucus. For Barack, the mean point spreads were 31.2 for non-caucus and 30.0 for caucus. Neither of these tests were statistically different (p = .630 and p = .872, respectively). Again, this is strong evidence that whether the contest was in a caucus or a non-caucus state has no bearing on the point spread, whether you are Hillary or Barack.

So, what are the lessons to be gleaned form these analyses? Some of then might be:

1) looking at raw data, without proper statistical analyses, can be misleading

2) adding to that the wild opinions of TV jabber jaws, can lead a person even further astray

3) paying selective attention to the raw data because of one's own personal biases can even further skew one's perception of the raw data

For all these reasons, it appears that the popular perception is that Barack has wone by larger margins in the caucus states, even though statistical analyses clearly demonstrate that he has not.

Please feel free to email me if you have any questions about the analyses I did. Also, if anyone wants, I can provide the raw data I used as well as the putput from the analyses.

I hope that this helps. Rob Anderson, Ph.D.

Alessandro Machi said...

Hi Rob, I think you may have slightly misunderstood the challenge and you may have some of the data incorrect as well. The caucus wins were not 10 for Obama and 5 for Hillary, it was more like 14 for Obama and 2 for Hillary with a tie.

I used a threshold margin of victory percentage of 61% to 39%. Any voting win margins over 61% were considered to be out of the norm when compared to all of the winning margins of all of the contests, both primary and caucus.

Barack Obama has 75-80% of his caucus state wins over this 61% threshold, but in the primary states Barack Obama only had 20% of his wins above that 61% threshold.

If one "adjusts" my threshold margin from above 61% then the numbers will begin to "normalize". However the fact that I can find a threshold percentage number in which a clear bias occurs between the primary elections and the caucus elections should not be taken lightly. The mathematical bias is a disenfranchisement of voters who are not being represented, these voters typically vote for Hillary Clinton.

Independent063 said...

Hi. Thanks for getting back to me so quickly. I don't think I made a mistake in counting the caucus and primary wins, but let's check it over again just to make sure. I got the data for the wins and whether the states were caucus states from this web page:

I got the point spreads from this web page:

I Using what is called a 'median split', which is the conventional way of determing thresholds, I got a difference of 21 points (i.e., 60.5 to 39.5) as the median split. This is consistent with your figure of 61% to fact almost identical!

Could you please go over the data from these two pages and make sure that I didn't make any mistakes. If I did, please let me know so that I can correct the numbers and re-run the statistics. If I did not, but you believe the data on these two pages to be incorrect, please let me know where I can find data for caucus/primary win counts, and for point spreads, that you believe to be more accurate so that I can re-run the analyses using your preferred data set.

Once we have ruled out, by accounting for, these two possible sources of error, if you still believe that analysing the data in a different mannor might lead to different results, then we can discuss alternative ways of grouping the data and running statistics.

Once we have satisfied all these concerns, then we will know for sure what underlying reality the statistics are pointing to.


Rob Anderson, Ph.D.

Alessandro Machi said...

I'll share my data here. It lists a total of 29 Obama victories so I guess it's missing a couple of his victories. I see a whole bunch of Caucus wins at the top of the chart, a mid range around 61% where there is some merging between caucuses and primaries, and then a whole bunch of Primary state victories on the lower end.

To me it looks like the Caucuses rum amok at the top of the chart and the primary wins overall have a significantly lower average winning percentage.

Lower vote totals from the caucus states along with the possibility of improprieties by overenthusiastic and "aggressive" caucus goers may have resulted in skewed numbers that don't accurately reflect the will of the caucus state populations.

Virgin Islands - 90% for Barack (Caucus vote).
Idaho - 80% Caucus vote
Hawaii - 76% Caucus vote
D.C. - 75% Caucus vote
Alaska - 75% Caucus vote
Kansas - 74% Caucus vote
Washington - 68% Caucus vote
Nebraska - 68% Caucus vote
Colorado - 67% Caucus vote
Democrats Abroad - 67% Caucus vote
Minnesota - 66% Caucus vote
Georgia - 66% Primary Vote
Illinois - 65% Primary Vote
Virginia - 64% Primary Vote
North Dakota - 61% Caucus Vote
Wyoming - 61% Caucus Vote
Mississippi - 61% Primary Vote
Maryland - 61% Primary Vote
Maine 59% - Caucus Vote
Vermont 59% - Primary Vote
Wisconsin 58% - Primary Vote
Louisiana 57% - Primary Vote
Utah 57% - Primary Vote
Alabama 56% - Primary Vote
North Carolina 56% - Primary Vote
South Carolina 55% - Primary Vote
Delaware 53% - Primary Vote
Connecticut 51% - Primary Vote
Guam 50% - Caucus Vote

Anonymous said...

Do people allowed to both in a cacus and primary in the same state/cycle. If the answer is yes then that may account for some the oddities. BTW I never beleived he won all those in the right way.

Alessandro Machi said...

There were four states that allowed both caucuses and primaries. In each instant, Hillary made massive, massive gains in the primary versus the caucus result within the same state.

Those four states were Texas, Washington, Idaho, and Nebraska. We're talking gains in which Hillary Clinton's vote total goes up by 20 or 30 percent while Barack Obama's drops an equal amount, just unreal stuff.

Please join my protest against the Credit Card Industry at