Using the IMDb dataset for testing MySQL performance

I wanted a nice sized database for testing MySQL performance optimization techniques. I found that the full IMDb database can be downloaded using a Python script named IMDbPY.py

Simple install on Debian / Ubuntu:
sudo apt-get install python-imdbpy

Download the imdby2sql script from here:
http://sourceforge.net/p/imdbpy/code/ci/default/tree/bin/imdbpy2sql.py

Download plain text data files from:
ftp://ftp.fu-berlin.de/pub/misc/movies/database/

Additional mirrors:
http://www.imdb.com/interfaces/

You can use wget by creating a new directory and running:
wget -r ftp://ftp.fu-berlin.de/pub/misc/movies/database/

Create tables and populate the database:
imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u 'mysql://root:root@localhost/imdb'

Additional documentation available here:
http://imdbpy.sourceforge.net/docs/README.sqldb.txt

Comments Off on Using the IMDb dataset for testing MySQL performance

Categories random