Netflix Prize for Dummies [ I.b ]
Yes, I wasn’t happy at all with the previous code so I changed it. It improved in processing time, coming down to 13 minutes and 59 seconds to aggregate all the files into a sole one (tough it increased size up to 2.62GB). Moreover, I have modified the structure so it’ll be easier to introduce the data into a database. Now the new file is divided into 4 (CSV) columns: movieid, userid, rating, date.
Here’s the VBA code:
Sub GroupData()
Dim T As Date
T = NowDim N As Double
Dim Text1 As String
Dim Text2 As String
Dim Text3 As StringOpen “C:\Netflix\training_set.txt” For Output Access Write As #1
For N = 1 To 17770
Open “C:\Netflix\training_set\mv_00″ & Format(N, “00000″) & “.txt” For Input Access Read As #2‘For the first line.
Input #2, Text1, Text2, Text3
Print #1, N & “,” & Right(Text1, Len(Text1) - (Len(CStr(N)) + 2)) & “,” & Text2 & “,” & Left(Text3, 10)‘For the rest of lines.
Do While Not EOF(2)
Input #2, Text1, Text2
Print #1, N & “,” & Right(Text3, Len(Text3) - 11) & “,” & Text1 & “,” & Left(Text2, 10)
Text3 = Text2
LoopClose #2
Next N
Close #1
MsgBox Format(Now - T, “hh:mm:ss”)
End Sub


