Fundamentals of TinyML: Overfitting

144 阅读3分钟

Train your model with the new augmented data in order to overcome overfitting

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be augmented wiht the full list of augmentation techniques below
train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=20,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest'
      )

# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
        '/tmp/horse-or-human/',  # This is the source directory for training images
        target_size=(100, 100),  # All images will be resized to 300x300
        batch_size=128,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

validation_datagen = ImageDataGenerator(rescale=1./255)

validation_generator = validation_datagen.flow_from_directory(
        '/tmp/validation-horse-or-human',
        target_size=(100, 100),
        class_mode='binary')

Dropout Regularization

You’ve been exploring overfitting, where a network may become too specialized in a particular type of input data and fare poorly on others. One technique to help overcome this is use of dropout regularization

When a neural network is being trained, each individual neuron will have an effect on neurons in subsequent layers. Over time, particularly in larger networks, some neurons can become overspecialized—and that feeds downstream, potentially causing the network as a whole to become overspecialized and leading to overfitting. Additionally, neighboring neurons can end up with similar weights and biases, and if not monitored this can lead the overall model to become overspecialized to the features activated by those neurons.

For example, consider this neural network, where there are layers of 2, 5, 5, and 2 neurons. The neurons in the middle layers might end up with very similar weights and biases.

image.png

While training, if you remove a random number of neurons and connections, and ignore them, their contribution to the neurons in the next layer are temporarily blocked

image.png

This reduces the chances of the neurons becoming overspecialized. The network will still learn the same number of parameters, but it should be better at generalization—that is, it should be more resilient to different inputs.

The concept of dropouts was proposed by Nitish Srivastava et al. in their 2014 paper “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”.

To implement dropouts in TensorFlow, you can just use a simple Keras layer like this:

tf.keras.layers.Dropout(0.2),

This will drop out at random the specified percentage of neurons (here, 20%) in the specified layer. Note that it may take some experimentation to find the correct percentage for your network.

For a simple example that demonstrates this, consider the Fashion MNIST classifier you explored earlier.

If you change the network definition to have a lot more layers, like this:

model = tf.keras.models.Sequential([

                     tf.keras.layers.Flatten(input_shape=(28,28)),

                     tf.keras.layers.Dense(256, activation=tf.nn.relu),

                     tf.keras.layers.Dense(128, activation=tf.nn.relu),

                     tf.keras.layers.Dense(64, activation=tf.nn.relu),

                     tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

Training this for 20 epochs gave around 94% accuracy on the training set, and about 88.5% on the validation set. This is a sign of potential overfitting.

Introducing dropouts after each dense layer looks like this:

model = tf.keras.models.Sequential([

                  tf.keras.layers.Flatten(input_shape=(28,28)),

                  tf.keras.layers.Dense(256, activation=tf.nn.relu),

                  tf.keras.layers.Dropout(0.2),

                  tf.keras.layers.Dense(128, activation=tf.nn.relu),

                  tf.keras.layers.Dropout(0.2),

                  tf.keras.layers.Dense(64, activation=tf.nn.relu),

                  tf.keras.layers.Dropout(0.2),

                  tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

When this network was trained for the same period on the same data, the accuracy on the training set dropped to about 89.5%. The accuracy on the validation set stayed about the same, at 88.3%. These values are much closer to each other; the introduction of dropouts thus not only demonstrated that overfitting was occurring, but also that adding them can help remove it by ensuring that the network isn’t overspecializing to the training data.

Keep in mind as you design your neural networks that great results on your training set are not always a good thing. This could be a sign of overfitting. Introducing dropouts can help you remove that problem, so that you can optimize your network in other areas without that false sense of security!

Exploring Loss Functions and Optimizers

There are generally 2 ways that you can declare these functions -- by name, in a string literal, or by object, by defining the class name of the function you want to use.

Here’s an example of doing it by name:

optimizer = 'adam'

And one of doing it using the functional syntax

from tensorflow.keras.optimizers import Adam
opt = Adam(learning_rate=0.001)

optimizer = opt

Using the former method is obviously quicker and easier, and you don’t need any imports, which can be easy to forget, in particular if you’re copying and pasting code from elsewhere! Using the latter has the distinct advantage of letting you set internal hyperparameters, such as the learning rate, giving you more fine-grained control over how your network learns.

You can learn more about the suite of optimizers in TensorFlow at www.tensorflow.org/api_docs/py… -- to this point you’ve seen SGD, RMSProp and Adam, and I’d recommend you read up on what they do. After that, consider reading into some of the others, in particular the enhancements to the Adam algorithm that are available.

Similarly you can learn about the loss functions in TensorFlow at www.tensorflow.org/api_docs/py… , and to this point you’ve seen Mean Squared Error, Binary CrossEntropy and Categorical CrossEntropy. Read into them to see how they work, and also look into some of the others that are enhancements to these.

A coding assignment

Coding Assignment