Netflix Prize for Dummies [III]

Posted in Netflix, databases, spreadsheets by Francisco Marco-Serrano @ Jul 30, 2007

MySQL ODBCNow we’ve got the data into a MySQL database, so next step is accessing it from our prefered application (sometimes, that means MS Excel, i.e.). So, let’s go:

1) Make sure you system is up to date (specially the Jet Engine).

2) Download and install the last MySQL ODBC Connector.

3) Go to Start_Settings_Control Panel_32bit ODBC and create a new DNS (choose MySQL driver).

Once finished, you’ll be able to access the data from MS Excel. However, you have to take into account you can’t view the data in the spreadsheet since the volume of rows is huge. The good thing is you’ll be able to calculate the statistics from there, or get data from pivot table, etc.

Hit Predictor

Posted in Netflix, maths, music, preferences, statistics by Francisco Marco-Serrano @ Jul 27, 2007

New techniques for an old art Yesterday I was talking to my friend and colleague Pau Rausell-Köster, from the Research Unit in Cultural Economics (Universitat de València), about the Netflix Prize. We were discussing about the foundations of taste and preferences, and how it was quite difficult to, by means of a devil reductionism, create a mathematical model that could predict how you’re going to rate a movie. The question was: it works!.

This conversation though led to another mathematical model it’s been used for a while by a company called Polyphonic HMI S.L. to predict if a song will be successful (aka “a HIT”). They use a methodology they have named as “Hit Song Science”, which basically uses “Spectral Decomposition” to get different musical attributes for all the songs they have analysed (3.5 million to date). They, they apply clustering techniques to the songs that have been a success (aka “a HIT”) in the last 5 years (I imagine, the time-frame is just to take out the trends and account for changes/evolution in people’s preferences). Then, they are able to predict if a new song will succeed in the market and they asign a rating (controlling type-I error).

There’s only a downsize: would the record companies invest in promoting songs with low rating?. This would affect the song to the extent of not helping it to become a hit, so, again our beloved maths would be changing the course of events and distorting the model by means of the feedback in flawed data (the reverse, type-II error, could as well happen, bad songs evaluated as possible hits being highly promoted and succeeding). Moreover, if this happens to be in a big scale, innovation in music creation is aborted…, unless… you’re brave and forget the model!.

PS For the Netflix Prize Teams: food for thought.

vosnap.COM

Posted in business, decision theory by Francisco Marco-Serrano @ Jul 24, 2007

Get a group of people into an office for a 3-day weekend and they would come with a great idea for creating a company. Well, sorry, they’ll create a company. Which one is the latest created company by means of this methodology?: VOSNAP.

They define this web application (of course, is a dotcom) as a "social quick voting tool that reduces group think and time wasted on decision-making by giving you the ability to receive a quick vote from a group of friends on any decision. Users can create a poll, send it to their friends to quickly vote on via text message or email, and receive the results instantly". So, a dictatorship of the masses?, is this prefered to the dictatorship of the leaders?, or is it better a technocracy?. Actually, I didn’t want to convert this post into a political one, but discuss about the ways a decision can be taken, if the application (VOSP) was really breakthrough, and if the Startup Weekend was really effective. However, as always, I’ve ended up changing mood after quite a few words and just letting you finish the quarrel (in Spain we say: "tirar la piedra y esconder la mano", this is, "throw the stone and hide the hand", ringing the bell and running like hell!).

Netflix Prize for Dummies [II]

Posted in Netflix, databases, operations research by Francisco Marco-Serrano @ Jul 18, 2007

Next is the database. In this example I’m going to use MySQL, although you could use PostgreSQL, or MS SQL, for example. I’m in a Windows OS.

2) Creating the database and dumping the data into it.

a. Create the database:

CREATE DATABASE netflix;

b. Create the tables:

“training_ser” is the table where I’m going to dump the training_set data, made up of the movie ids, user ids, the rating, and the date.

CREATE TABLE `netflix`.`training_set` (
`idmovie` INTEGER UNSIGNED NOT NULL,
`iduser` INTEGER UNSIGNED NOT NULL,
`rating` INTEGER UNSIGNED NOT NULL,
`date` VARCHAR(10) NOT NULL,
PRIMARY KEY USING BTREE(`idmovie`, `iduser`);
)
ENGINE = MyISAM
COMMENT = ‘User ratings’;

“movies” is the table where I’ll dump the information from the movies file, made up of movie ids, release date, and title.

CREATE TABLE `netflix`.`movies` (
`idmovie` INTEGER UNSIGNED NOT NULL,
`release` INTEGER UNSIGNED NOT NULL,
`title` VARCHAR(150) NOT NULL,
PRIMARY KEY (`idmovie`)
)
ENGINE = MyISAM
COMMENT = ‘Movies List’;

Try doing the same for “probe” and “qualifying”.

c. Dump the data into the tables:

LOAD DATA LOCAL INFILE “C:/Netflix/training_set.txt”
REPLACE INTO TABLE netflix.training_set
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’ STARTING BY ”
(idmovie, iduser, rating, date);

LOAD DATA LOCAL INFILE “C:/Netflix/movie_titles.txt”
REPLACE INTO TABLE netflix.movies
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’ STARTING BY ”;

Try doing the same for “probe” and “qualifying”.

Any problems?.

Testosteronomics

Posted in economics, game theory by Francisco Marco-Serrano @ Jul 17, 2007

Testosterone StructureI think long time ago I tried to explain you the relationship between economics and psychology (further than me, an economist, marrying a psychologist). I recall the main argument was the rationality premise attached to orthodox economics was assumed as wrong (who’s ever seen an homo oeconomicus?). As I said, “orthodox”; that’s right, some people would say “bounded rationality”…, no!, even when you aren’t (apparently) a rational being, you are being one.

So, please let me introduce you to the last work coming from game theory, one of the areas where economists and psychologist better match. From an article from the Economist (Money isn’t everything), we can get the idea why we would “spite” (see this post at Punk Rock OR) to a miserable offer. The probable answer: too much testosterone, mates! (it’s the animal sense).