downloads and prepares various mnist-compatible datasets
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Connor Olding 988f481bc0 bump version 1 year ago
mnists bump version 1 year ago
.gitignore add .gitignore 1 year ago
LICENSE init 1 year ago
README.md support python 3.5, fix exceptions 1 year ago
TODO support python 3.5, fix exceptions 1 year ago
requirements.txt init 1 year ago
setup.py bump version 1 year ago

README.md

mnists

downloads and prepares various mnist-compatible datasets.

files are downloaded to ~/.mnist and checked for integrity by SHA-256 hashes.

dependencies

python 3.5 (or later), numpy.

install

pip install --upgrade 'https://github.com/notwa/mnists/tarball/master#egg=mnists'

I recommend adding --upgrade-strategy only-if-needed to the command so that you don’t accidentally “upgrade” numpy to a version not compiled specifically for your environment. This can happen when using e.g. Anaconda.

usage

import mnists

dataset = "emnist_balanced"
train_images, train_labels, test_images, test_labels = mnists.prepare(dataset)

the default images shape is (n, 1, 28, 28) and scaled to the range [0, 1]. labels are output in one-hot encoding.

prepare arguments

pass flatten=True to get a flattened (n, 784) image shape.

pass return_floats=False to get the raw [0, 255] integer range of images.

pass return_onehot=False to get the raw [0, M-1] integer encoding of labels.

why the extra dimension?

you will notice that, by default, there is a single-dimensional entry in the shape of images: (n, 1, 28, 28). this exists to obtain compatibility with programs that expect a number of color channels in that place. since mnist-like datasets are (as of writing) all grayscale, there is only one color channel, and thus the size of this dimension is 1.

datasets

in alphabetical order, using default mnists.prepare arguments:

subdirectory dataset train images shape train labels shape test images shape test labels shape
emnist emnist_balanced (112800, 1, 28, 28) (112800, 47) (18800, 1, 28, 28) (18800, 47)
emnist emnist_byclass (697932, 1, 28, 28) (697932, 62) (116323, 1, 28, 28) (116323, 62)
emnist emnist_bymerge (697932, 1, 28, 28) (697932, 47) (116323, 1, 28, 28) (116323, 47)
emnist emnist_digits (240000, 1, 28, 28) (240000, 10) (40000, 1, 28, 28) (40000, 10)
emnist emnist_letters (124800, 1, 28, 28) (124800, 26) (20800, 1, 28, 28) (20800, 26)
emnist emnist_mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)
fashion-mnist fashion_mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)
mnist mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)