Tuesday, December 15, 2009

Scientifically Speaking, the 100-point Scale Isn't

Before I get into rating the raters, I need to address two fundamental concepts in testing: validity and reliability. I have a master's degree in school administration so I have had a lot of classes on statistical analysis and trust me; wine raters do not score high on either count, validity or reliability.

Validity refers to the fact that a test measures what it purports to measure. Wine quality is such an individual value that no tester can validly measure a wine that you would like. I refer to an experience in the Bonair Winery tasting room where a man walked in and asked me to try "our best wine." I proudly poured him our Morrison Vineyard Cabernet Sauvignon. His face scrunched up in pain and announced, "yuk, that is really sour. Don't you have anything sweeter?" Ah yes, a nice sweet late harvest Riesling. So, to understand the validity of a rater, you have to know what that rater likes. For example, Parker likes boatloads of new French oak, high alcohol, and micro-oxygenation to the point of burning the varietal character out of the wine. So if this is your palate, Parker might be considered valid for you, but not for real wine drinkers who drink wine on a daily basis with meals. If you like food friendly wine, you should probably find a more 'valid' rater.

Reliability refers to the ability to give the same score to the same wine again and again. This is easy with most raters, because they are looking at the bottle and their notes. "Hum, this tastes like dog piss. What did I give this wine last time? Oops, I gave it a 94 – (note to self: don't smoke too much marijuana prior to rating wines.) Well, to be reliable, I'll give it a 92 an hope no one calls me on it" Or, "This wine is from Walla, so I have to give it at least 90 points, otherwise people would think I don't know what I am doing. (Note to self: What am I doing?)"

Psychology, which deals in an inexact science just like wine rating, uses test-retest reliability to verify that a test can be repeated and give the same result. Using a complex formula, the test to retest reliability must be 95% or better for the test to be reliable. If there were a true, 100-point scale, this might be easier. A score of 90 would be statistically the same as 94 and 86. (This is probably truer than you think.) But, we are only dealing with a 20 point scale, so the test-retest reliability is harder. It comes out that a rater must nail the score within one point each time to be reliable, i.e. first blind test 90, second test cannot be more than 91 or less than 89. Also, to be a truly reliable score, not only must Gregutt nail the score blindfolded in this manner, the Wine Advocate and the Wine Advisor must also come within the same range of 89-91.

So, you can see, wine ratings are highly unreliable and invalid, statistically speaking both internally and externally. If you are unsure about what wine to buy, buy only wine your friends will think is good. They will be impressed and you should be able to choke the stuff down while all are smiling and commenting how good hog piss tastes.

What is the difference between a connoisseur and a city sewer? Not much, both are full of crap.

No comments:

Post a Comment