Netflix Prize for Dummies [ I.b ]

Posted in Netflix, VBA, operations research by Francisco Marco-Serrano @ Jul 16, 2007

Yes, I wasn’t happy at all with the previous code so I changed it. It improved in processing time, coming down to 13 minutes and 59 seconds to aggregate all the files into a sole one (tough it increased size up to 2.62GB). Moreover, I have modified the structure so it’ll be easier to introduce the data into a database. Now the new file is divided into 4 (CSV) columns: movieid, userid, rating, date.

Here’s the VBA code:

Sub GroupData()

Dim T As Date
T = Now

Dim N As Double
Dim Text1 As String
Dim Text2 As String
Dim Text3 As String

Open “C:\Netflix\training_set.txt” For Output Access Write As #1

For N = 1 To 17770
Open “C:\Netflix\training_set\mv_00″ & Format(N, “00000″) & “.txt” For Input Access Read As #2

‘For the first line.
Input #2, Text1, Text2, Text3
Print #1, N & “,” & Right(Text1, Len(Text1) - (Len(CStr(N)) + 2)) & “,” & Text2 & “,” & Left(Text3, 10)

‘For the rest of lines.
Do While Not EOF(2)
Input #2, Text1, Text2
Print #1, N & “,” & Right(Text3, Len(Text3) - 11) & “,” & Text1 & “,” & Left(Text2, 10)
Text3 = Text2
Loop

Close #2

Next N

Close #1

MsgBox Format(Now - T, “hh:mm:ss”)

End Sub

Oops! It seems we have found nothing related.

2 Comments »

  1. Remember, this is just for dummies. Don’t start me with “it could be optimised!”, “what a crappy code!”, blah blah, blah…

    Of course, I would accept suggestions! ; )

    Comment by Francisco Marco-Serrano — July 16, 2007 @ 6:03 pm

  2. Moreover, consider the above code for transforming the other files: “probe.txt” and “qualifying.txt”.

    Comment by Francisco Marco-Serrano — July 17, 2007 @ 12:00 am

RSS feed for comments on this post.

Leave a comment

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 5 chars within 0..9 and A..F, and submit the form.

  

Oh no, I cannot read this. Please, generate a