downloads and prepares various mnist-compatible datasets
Go to file
2018-03-15 02:20:37 +01:00
mnists rewrite main to dump tables instead 2018-03-15 02:11:09 +01:00
.gitignore add .gitignore 2018-03-15 01:26:10 +01:00
LICENSE init 2018-03-14 16:45:53 +01:00
README.md reword installation instructions 2018-03-15 02:20:37 +01:00
requirements.txt init 2018-03-14 16:45:53 +01:00
setup.py init 2018-03-14 16:45:53 +01:00
TODO reword installation instructions 2018-03-15 02:20:37 +01:00

mnists

downloads and prepares various mnist-compatible datasets.

files are downloaded to ~/.mnist and checked for integrity by SHA-256 hashes.

dependencies

python 3.6 (or later), numpy.

install

pip install --upgrade 'https://github.com/notwa/mnists/tarball/master#egg=mnists'

I recommend adding --upgrade-strategy only-if-needed to the command so that you don't accidentally "upgrade" numpy to a version not compiled specifically for your environment. This can happen when using e.g. Anaconda.

usage

import mnists

dataset = "emnist_balanced"
train_images, train_labels, test_images, test_labels = mnists.prepare(dataset)

the default images shape is (n, 1, 28, 28). pass flatten=True to mnists.prepare to get (n, 784).

datasets

in alphabetical order, using default mnists.prepare parameters:

dataset train images shape train labels shape test images shape test labels shape
emnist_balanced (112800, 1, 28, 28) (112800, 47) (18800, 1, 28, 28) (18800, 47)
emnist_byclass (697932, 1, 28, 28) (697932, 62) (116323, 1, 28, 28) (116323, 62)
emnist_bymerge (697932, 1, 28, 28) (697932, 47) (116323, 1, 28, 28) (116323, 47)
emnist_digits (240000, 1, 28, 28) (240000, 10) (40000, 1, 28, 28) (40000, 10)
emnist_letters (124800, 1, 28, 28) (124800, 26) (20800, 1, 28, 28) (20800, 26)
emnist_mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)
fashion_mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)
mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)