Netflix Prize for Dummies [ I ]
The Netflix Prize is in the company’s own words the”quest” for “substantially improve(ing) the accuracy of predictions about how much someone is going to love a movie based on their movie preferences”.
I read about the prize last february on Michael Trick’s blog and the first thing I saw was the $1 Million for the winner. However, although we’re on it for the money (YES!) we don’t thing we gonna get it. So, let’s mess about it!:
_For all of you that are, like me, amateur OR-ers, I’m starting a series of posts showing where the heck I am.
……………………………………………………….
1) The data: the training set (data you have to use to create the model) is made up of more than 17 thousand text files. So, although some experts are advising on Netflix’s forums not to group them, I’ll do.
Following my own weaknesses and economist-like-mind I’m going to group the data in a single file, in order to dump it into a database (PostgreSQL, probably). Even more, as I don’t have time to learn any other language, I’ll be using VBA for Excel.
Here we go…
Sub AgrupaDatos()
Dim N As Double
Dim TextoArchivo As StringOpen “C:\training_set.txt” For Output As #1
For N = 1 To 17770
Open “C:\training_set\mv_00″ & Format(N, “00000″) & “.txt” For Input As #2Do While Not EOF(2)
Line Input #2, TextoArchivo
Print #1, TextoArchivo
LoopClose #2
Next N
Close #1
End Sub
The module above takes about 30 minutes (Pentium 1.73 Ghz, 1GB RAM) to process the data into a file with a size of 1,92GB.
Next, the database.
Oops! It seems we have found nothing related.


I’m not quite happy with the code so I’m re-doing it to allow for a better transfer to a database.
Comment by Francisco Marco-Serrano — July 16, 2007 @ 12:05 am