downloads and prepares various mnist-compatible datasets
Go to file
2018-03-15 01:44:40 +01:00
mnists add README shape-dumping to main 2018-03-15 01:44:40 +01:00
.gitignore add .gitignore 2018-03-15 01:26:10 +01:00
LICENSE init 2018-03-14 16:45:53 +01:00
README.md add README shape-dumping to main 2018-03-15 01:44:40 +01:00
requirements.txt init 2018-03-14 16:45:53 +01:00
setup.py init 2018-03-14 16:45:53 +01:00
TODO init 2018-03-14 16:45:53 +01:00

mnists

downloads and prepares various mnist-compatible datasets.

files are downloaded to ~/.mnist and checked for integrity by SHA-256 hashes.

dependencies

python 3.6 (or later), numpy.

install

pip install --upgrade --upgrade-strategy only-if-needed 'https://github.com/notwa/mnists/tarball/master#egg=mnists'

I've added --upgrade-strategy to the command-line so you don't accidentally "upgrade" numpy to a version not compiled specifically for your system. This can happen when using e.g. Anaconda.

usage

import mnists

dataset = "emnist_balanced"
train_images, train_labels, test_images, test_labels = mnists.prepare(dataset)

the default images shape is (n, 1, 28, 28). pass flatten=True to mnists.prepare to get (n, 784).

datasets

in alphabetical order:

emnist

  • emnist_balanced
    train images shape: (112800, 1, 28, 28)
    train labels shape: (112800, 47)
    test images shape: (18800, 1, 28, 28)
    test labels shape: (18800, 47)

  • emnist_byclass
    train images shape: (697932, 1, 28, 28)
    train labels shape: (697932, 62)
    test images shape: (116323, 1, 28, 28)
    test labels shape: (116323, 62)

  • emnist_bymerge
    train images shape: (697932, 1, 28, 28)
    train labels shape: (697932, 47)
    test images shape: (116323, 1, 28, 28)
    test labels shape: (116323, 47)

  • emnist_digits
    train images shape: (240000, 1, 28, 28)
    train labels shape: (240000, 10)
    test images shape: (40000, 1, 28, 28)
    test labels shape: (40000, 10)

  • emnist_letters
    train images shape: (124800, 1, 28, 28)
    train labels shape: (124800, 26)
    test images shape: (20800, 1, 28, 28)
    test labels shape: (20800, 26)

  • emnist_mnist
    train images shape: (60000, 1, 28, 28)
    train labels shape: (60000, 10)
    test images shape: (10000, 1, 28, 28)
    test labels shape: (10000, 10)

fashion-mnist

  • fashion_mnist
    train images shape: (60000, 1, 28, 28)
    train labels shape: (60000, 10)
    test images shape: (10000, 1, 28, 28)
    test labels shape: (10000, 10)

mnist

  • mnist
    train images shape: (60000, 1, 28, 28)
    train labels shape: (60000, 10)
    test images shape: (10000, 1, 28, 28)
    test labels shape: (10000, 10)