Lies, Damned Lies, and Statistics

I was fortunate enough never to take statistics in college. Though I had a math minor in support of my engineering degree, I missed the statistics side of the course curricula. Thus I never had the intense pleasure of trying to calculate a standard deviation of a data set manually. Instead, I began to learn statistics when I was in industry, working in the environmental group for a chemical company, and I was trying to make sense of water sample data, monitoring a pollutant’s concentration. That concentration varied over several orders of magnitude, and it did not make sense to try to average the data.

I studied the available tools, and discovered probability paper. This graph paper was created prior to the development of easy computer calculation tools, and it provided a convenient way to plot the data. What was a curved line on standard graph paper that couldn’t be used for prediction, turned into a straight line on probability paper. The paper had one axis that used logarithms to handle multiple magnitudes on a single graph. This allowed me to predict that there was a significant chance of a really large release of a certain pollutant because it fit on the graph. I was learning about the predictive power of statistics.

I was hooked. At my employer, there were extensive tools available on product quality measurement, and training on how to do designed experimentation. I became an avid user of the technology, and once computer tools were available to do the grunt calculation work, I became known for my own use of the technology. Later, I transferred this knowledge into working with groups using the Deming statistical and quality improvement methods. To this day, I am convinced that if my company had adopted Deming as its statistical guru instead of borrowing GE’s truncated version called Six Sigma, my company would have prospered and grown over the last two decades of my career instead of stagnating.

Supposedly it was Disraeli who first stated the “Lies, Damned Lies, and Statistics” quote. Yet to this day it remains the most memorable meme associated with statistics, and my guess is that not one in ten in the US has a working understanding of statistics. That is a shame, since a knowledge of statistics provides a key that unlocks understanding of many other concepts that are important to living in today’s society. During this past election season, we were exposed to statistics on a daily basis, with polling data driving the discussion. How many times did we hear about the margin of error of a poll? How many times did we accept the data at face value, and accept the poll results as the equivalent of the call of a horse race? “At the mile pole, Clinton is in the lead by 3 lengths”.

In reality, understanding measures of uncertainty requires a good bit of statistical knowledge. At its heart, a 2 person poll is a measure of two means, or averages. You are trying to determine whether the average level of support for candidate A differs in a statistically significant amount from that of candidate B. If there is a clear difference between the averages, then the poll result is said to be outside of the margin of error, and there is a clear leader.

It’s that term, margin of error, that must be looked at more closely. When you are sampling a population of people, it is always possible that your sample is not representative of the population as a whole. Therefore, your estimate of the mean comes with a built in error factor. If you remember the old bell curve you used to be graded on, that shows the magnitude of the error factor. Candidate A may have a polling result of 45%. But, there may be a 5% chance that the average is as high as 48%, or as low as 42%. Now if candidate B’s polling result is 48%, then the horse race results would say that candidate B is ahead by 3 lengths. But the truth is, you don’t know for sure that candidate B is ahead of candidate A. The result is within the margin of error.

All of this assumes that the sampled population is representative of the total population being measured. Back in the days when everyone had a landline phone, and there were no caller ID’s to use as a call screener, and when there was not a reluctance to respond to polls, a telephone survey worked remarkably well as a sampling of the population. As time and progress would have it, a landline only survey no longer can be taken as being representative. People, being stubborn, or too busy to respond to polls, or ideologically predisposed to wish to hide their response from the world, can bias polls by failing to respond when requested.

So polling companies are now struggling to determine what form of polling now is most representative of the population at large, and in particular, in the population most inclined to vote. What this means to all of us in a large increase in the margin of error of any polling data in future elections. Caveat emptor.

By the way, I am attempting to use the false principle of treating the mean of a range of values as an absolute value. I am using it as a predictor of stock market behavior, and am using this as a way of selecting candidate stocks for short term investment. When a company reports quarterly results, they are presented as not meeting expectations, meeting expectations, or exceeding expectations. The expectation value is given as an absolute. However, that number is merely an average of the estimates of all of the analysts covering the stock. Often, if the stock’s earnings fall a penny or two short of the expected value, the stock takes a short-term dip. I am looking for candidate stocks where the reported value is clearly within the margin of error for the estimated value, but the stock loses 10% or more of its value within a day. My experience is that the stock often recovers a significant part of that loss within a week or two. So I am placing my bets accordingly. This technique will not work in the event of a totally bad result where the value is outside of the margin of error, because then there is more wrong with the stock than just not meeting estimates. So far my results have been good, but that may just be luck. If I’m really lucky, then I’m reading the statistics right.

1 thought on “Lies, Damned Lies, and Statistics”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s