Netflix Prize for Dummies [II]
Next is the database. In this example I’m going to use MySQL, although you could use PostgreSQL, or MS SQL, for example. I’m in a Windows OS.
2) Creating the database and dumping the data into it.
a. Create the database:
CREATE DATABASE netflix;
b. Create the tables:
“training_ser” is the table where I’m going to dump the training_set data, made up of the movie ids, user ids, the rating, and the date.
CREATE TABLE `netflix`.`training_set` (
`idmovie` INTEGER UNSIGNED NOT NULL,
`iduser` INTEGER UNSIGNED NOT NULL,
`rating` INTEGER UNSIGNED NOT NULL,
`date` VARCHAR(10) NOT NULL,
PRIMARY KEY USING BTREE(`idmovie`, `iduser`);
)
ENGINE = MyISAM
COMMENT = ‘User ratings’;
“movies” is the table where I’ll dump the information from the movies file, made up of movie ids, release date, and title.
CREATE TABLE `netflix`.`movies` (
`idmovie` INTEGER UNSIGNED NOT NULL,
`release` INTEGER UNSIGNED NOT NULL,
`title` VARCHAR(150) NOT NULL,
PRIMARY KEY (`idmovie`)
)
ENGINE = MyISAM
COMMENT = ‘Movies List’;
Try doing the same for “probe” and “qualifying”.
c. Dump the data into the tables:
LOAD DATA LOCAL INFILE “C:/Netflix/training_set.txt”
REPLACE INTO TABLE netflix.training_set
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’ STARTING BY ”
(idmovie, iduser, rating, date);
LOAD DATA LOCAL INFILE “C:/Netflix/movie_titles.txt”
REPLACE INTO TABLE netflix.movies
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’ STARTING BY ”;
Try doing the same for “probe” and “qualifying”.
Any problems?.
Oops! It seems we have found nothing related.



Install the ODBC connector (http://dev.mysql.com/downloads/connector/odbc/3.51.html) as well, so you will be able to access the data from external applications, such as spreadsheets.
Comment by Administrator — July 20, 2007 @ 2:02 pm