…I already live in an Idiocracy. I don’t watch too much TV, regretfully not even my beloved ‘IT Crowd‘, ‘Discovery Channel’, ‘History Channel’, CNN (Spanish version), other news, and some stand-ups (’monólogos’ in Spanish), so I had to find out about "Idiocracy" (the movie) thanks to ‘Soy Geek‘. It seems I don’t have to regret that much have had no such notice of the movie, since the box office was such low they took it out inmediately from the screens (that is, not much advertising on this masterpiece); this is what I call a ‘death by success’: yes, it was a success, they anticipated the future in such a way we should be categorise the movie as a ‘reality show’ rather than ’sci-fi’ (that’s why everyone hated the movie!!!).

From OR perspective what really liked was to see the Gaussian Curve to show the military listening to the doctor that was preparing the experiment the guy he’d choosen to be frozen was ‘AVERAGE‘. He damned really was (that point where average equals median equals mode)… until he awakes… the distribution has changed… and he’s now in the upper tail!!!, ‘King of the World’!!!.
Morale: No matter how you’re, you’ll always be who you’re in comparison to others.
Hey!, No one told us Statistics was a good subject for ‘morales’!. I personally don’t like this morale.

Recently I had my private nightmare trying to explain…, well, better defend our poor ‘average’; it is most probably the most used and abused statistical measure, and is not always well employed. For starters, the biggest confussion between it and the ‘mean’ in probabilistic contexts; then, some misunderstandings on the use of the ‘median’, an ‘average’ ‘mean’ s cousin. Following with the lack of sampling information (type of sampling, sampling error, confidence, variance), and finishing with this post that hasn’t closed the gap between previous knowledge and current knowledge on the matter, for sure…, but that’s another history.
Yesterday I was talking to my friend and colleague Pau Rausell-Köster, from the Research Unit in Cultural Economics (Universitat de València), about the Netflix Prize. We were discussing about the foundations of taste and preferences, and how it was quite difficult to, by means of a devil reductionism, create a mathematical model that could predict how you’re going to rate a movie. The question was: it works!.
This conversation though led to another mathematical model it’s been used for a while by a company called Polyphonic HMI S.L. to predict if a song will be successful (aka “a HIT”). They use a methodology they have named as “Hit Song Science”, which basically uses “Spectral Decomposition” to get different musical attributes for all the songs they have analysed (3.5 million to date). They, they apply clustering techniques to the songs that have been a success (aka “a HIT”) in the last 5 years (I imagine, the time-frame is just to take out the trends and account for changes/evolution in people’s preferences). Then, they are able to predict if a new song will succeed in the market and they asign a rating (controlling type-I error).
There’s only a downsize: would the record companies invest in promoting songs with low rating?. This would affect the song to the extent of not helping it to become a hit, so, again our beloved maths would be changing the course of events and distorting the model by means of the feedback in flawed data (the reverse, type-II error, could as well happen, bad songs evaluated as possible hits being highly promoted and succeeding). Moreover, if this happens to be in a big scale, innovation in music creation is aborted…, unless… you’re brave and forget the model!.
PS For the Netflix Prize Teams: food for thought.
Time and again I’m thinking of the errors people commit; yes, commit, not make, I explain you the reason for my assertion.
In statistics, when we have a declared statement (”null hypothesis” we say) we have two options, haven’t we?: to accept, or to reject (actually, we could rephrase and say that rejecting is accepting the “alternative hypothesis”). So, now that comes the shocking info for those alien to statistics: when we make such a decision we are not just exposed to the error of being wrong or right, we are exposed to TWO ERRORS (brilliant, as if life wasn’t hard enough!).
First error (the “common one”) is what we would commit if our decision is to reject the declared statement when this one is right (bollocks!). It’s what we represent with an α and we’d call it “level of significance” (probability of cocking up failing to accept the right statement); usually statisticians play with levels of 5% or 1%, what means the “confidence level” (1-α) is between 95% and 99% (remember, that’s just probability!).
Second error (the “not-so-known-one”) is called “false negative” (β for friends and colleagues), and is committed when we accept the statement and what was right was the alternative one (again, damned!).
And know here comes the trick. Trying to avoid one (or reducing the probability of committing it) would increase your chances of committing the other (so, another Catch 22 in life). So, the only thing that rests us is to use the best of our knowledge and try to control one without exposing us too much to the other (that’s what statistics are for, mate!). Because, my dear colleagues, avoiding the decision, IS NOT A CHANCE!!!.