downloads and prepares various mnist-compatible datasets
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Connor Olding 03b9c28554 bump version 7 months ago
mnists bump version 7 months ago
.gitignore add .gitignore 2 years ago
LICENSE init 2 years ago
README.md add qmnist dataset 7 months ago
TODO support python 3.5, fix exceptions 2 years ago
requirements.txt init 2 years ago
setup.py bump version 7 months ago

README.md

mnists

downloads and prepares various mnist-compatible datasets.

files are downloaded to ~/.mnist and checked for integrity by SHA-256 hashes.

dependencies

python 3.5 (or later), numpy.

install

pip install --upgrade 'https://github.com/notwa/mnists/tarball/master#egg=mnists'

I recommend adding --upgrade-strategy only-if-needed to the command so that you don’t accidentally “upgrade” numpy to a version not compiled specifically for your environment. This can happen when using e.g. Anaconda.

usage

import mnists

dataset = "emnist_balanced"
train_images, train_labels, test_images, test_labels = mnists.prepare(dataset)

the default images shape is (n, 1, 28, 28) and scaled to the range [0, 1]. labels are output in one-hot encoding.

prepare arguments

pass flatten=True to get a flattened (n, 784) image shape.

pass return_floats=False to get the raw [0, 255] integer range of images.

pass return_onehot=False to get the raw [0, M-1] integer encoding of labels.

why the extra dimension?

you will notice that, by default, there is a single-dimensional entry in the shape of images: (n, 1, 28, 28). this exists to obtain compatibility with programs that expect a number of color channels in that place. since mnist-like datasets are (as of writing) all grayscale, there is only one color channel, and thus the size of this dimension is 1.

datasets

in alphabetical order, using default mnists.prepare arguments:

subdirectory dataset train images shape train labels shape test images shape test labels shape
emnist emnist_balanced (112800, 1, 28, 28) (112800, 47) (18800, 1, 28, 28) (18800, 47)
emnist emnist_byclass (697932, 1, 28, 28) (697932, 62) (116323, 1, 28, 28) (116323, 62)
emnist emnist_bymerge (697932, 1, 28, 28) (697932, 47) (116323, 1, 28, 28) (116323, 47)
emnist emnist_digits (240000, 1, 28, 28) (240000, 10) (40000, 1, 28, 28) (40000, 10)
emnist emnist_letters (124800, 1, 28, 28) (124800, 26) (20800, 1, 28, 28) (20800, 26)
emnist emnist_mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)
fashion-mnist fashion_mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)
mnist mnist (60000, 1, 28, 28) (60000, 10) (10000, 1, 28, 28) (10000, 10)
qmnist qmnist (60000, 1, 28, 28) (60000, 10) (60000, 1, 28, 28) (60000, 10)