MNIST
MNIST是一个包含60000个训练样本和10000个测试样本的手写数字数据集,每个灰度图为28*28,MNIST中包含4个文件:
train-images-idx3-ubyte
train-labels-idx1-ubyte
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte
IDX3文件格式

IDX1文件格式

代码
import numpy as np
def read_idx3(filename):
with open(filename, 'rb') as fo:
buf = fo.read()
index = 0
header = np.frombuffer(buf, '>i', 4, index)
index += header.size * header.itemsize
data = np.frombuffer(buf, '>B', header[1] * header[2] * header[3], index).reshape(header[1], -1)
return data
def read_idx1(filename):
with open(filename, 'rb') as fo:
buf = fo.read()
index = 0
header = np.frombuffer(buf, '>i', 2, index)
index += header.size * header.itemsize
data = np.frombuffer(buf, '>B', header[1], index)
return data
结果
