The Machine Learning Paradigm

Before you get into the nuts and bolts of using and programming with machine learning, let's explore what machine learning is at a fundamental level.

I personally like to see it as a new paradigm for programming computers, where instead of just coding a solution, you can teach a computer how to learn a solution. We can think of machine learning as a new paradigm for computer programming. This can be incredibly useful, and it can open up new scenarios for you to solve problems that can't be solved with programming alone.

Explicit coding is the bread and butter of software developers all over the world. You think about the scenario that you're working on or the problem that you're trying to solve, and you express the solution to that by defining the rules that determine the behavior of a program. For example, in this bricks game, the ball moves along a path, which can get changed when it hits a brick. As a programmer, you have to figure out the behavior. How does the ball bounce? Which brick do I remove? What happens when the ball hits a wall? Or what happens when the ball misses the bat? Everything needs to be figured out in advance. It's predetermined by the programmer, and it's then coded and tested. The more complex the scenario, the more code you'll have to figure out.

Let's consider activity detection. Say you want to write an app that uses sensors on a phone or a watch or something else to determine a person's activity. You could, for example, use the data about their speed and write a rule that determines if the speed is below a certain amount, then they're probably walking. You have data. You have a rule. And then you can get an answer. And then say you extend this to determine if they're running by building on that rule. If, for example, it's below 4 miles an hour, then they're walking. Otherwise, they're running. That still works. And then maybe you can extend that even further to see if they're biking. If the speed is below 4, they're walking. Otherwise, if it's below 12, they're running. Otherwise, we can say they're biking.

But then how would you handle golfing? What rule could you write that determines that they're actually playing golf? Also, by now, you've probably realized that the other rules are also a little bit naive. You can't just go by speed. You might run downhill faster than your bike uphill, for example. So let's go back to our diagram.

Recall that we mentioned that traditional programming is when you create rules in a programming language and these rules act on data and provide you with answers. With activity detection, we had created rules to figure out the user's activity, be it walking, biking, or running, but then we hit a wall with golfing. Not only that, we can also see that our algorithm was that little bit naive. Our rules don't really work that well. For example, you might run downhill faster than you bike uphill. And as a result, your app might fail when written with rules like this.

Machine learning can help solve this problem. And it can be represented with a simple rearrangement of the diagram. So instead of you trying to figure out the rules that act on the data to give you an answer, what if you go the other way around and you provide the answers with the data and you have a computer that can figure out the rules that will match them together? Once it's done this, it can then apply those rules to future data in order to figure out the answers about it. So in our activity detection scenario, we could gather a lot of data from sensors and label them with the activity that the user is following. So now, by matching parts of the data with the label that describes the activity, the computer might be able to figure out the rules for what makes an activity walking or running or biking or, yes, even golfing.

Let's see how it works. First, the computer will make a guess as to the relationship between the data and its labels. It does this by randomly initializing a neural network. And you'll delve into that a little later. But beyond the specifics, it's simply guessing as to the relationship. It then measures how good or how bad that guess is. The terminology often used here is loss. Higher loss is roughly analogous to lower accuracy. So then you can measure the results of your guess, and you can then use the data from the accuracy measurement to figure out your next guess, optimizing based on what you already know. And if you then repeat the process with the logic being that each subsequent guess gets better than the previous one, then your model becomes more and more accurate. It's learning from experience what the best guess might be. So let's return to the diagram describing machine learning from a high level. I've changed the word "answers" to "labels" here, and that's generally used for the term that describes your data. But the rest still stands. You start with label data and go through the process I previously mentioned. And you end up with a set of rules that matches that data to those labels. This constructs a model, and from this model, you can then give it new data, and it will figure out how closely that data fits the set of labels. And it will return you something called a set of inferences, and these are the probabilities that the data matches each specific label.

Fitting lines

import math
# Edit these parameters to try different loss measurements.
# Your Y will be calculated as Y=wX+b, so if w=3, and b=-1, then Y=3x-1

w = 3
b = -1

x = [-1, 0, 1, 2, 3, 4]
y = [-3, -1, 1, 3, 5, 7]
myY = []


for thisX in x:
    thisY = (w*thisX)+b
    myY.append(thisY)

print("Real Y is " + str(y))
print("My Y is   " + str(myY))

# let's calculate the loss
total_square_error = 0
for i in range(0, len(y)):
    square_error = (y[i] - myY[i]) ** 2
    total_square_error += square_error

print("My loss is: " + str(math.sqrt(total_square_error)))

Output:

Real Y is [-3, -1, 1, 3, 5, 7]   
My Y is [-4, -1, 2, 5, 8, 11]   
My loss is: 5.5677643628300215

Gradient Descent: minimize the loss

# First import the functions we will need
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Define our initial guess
INITIAL_W = 10.0
INITIAL_B = 10.0

# Define our loss function
def loss(predicted_y, target_y):
  return tf.reduce_mean(tf.square(predicted_y - target_y))

# Define our training procedure
def train(model, inputs, outputs, learning_rate):
  with tf.GradientTape() as t:
    current_loss = loss(model(inputs), outputs)
    # Here is where you differentiate the model values with respect to the loss function
    dw, db = t.gradient(current_loss, [model.w, model.b])
    # And here is where you update the model values based on the learning rate chosen
    model.w.assign_sub(learning_rate * dw)
    model.b.assign_sub(learning_rate * db)
    return current_loss

# Define our simple linear regression model
class Model(object):
  def __init__(self):
    # Initialize the weights
    self.w = tf.Variable(INITIAL_W)
    self.b = tf.Variable(INITIAL_B)

  def __call__(self, x):
    return self.w * x + self.b

# Train our model
# Define our input data and learning rate
xs = [-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]
ys = [-3.0, -1.0, 1.0, 3.0, 5.0, 7.0]
LEARNING_RATE=0.09

# Instantiate our model
model = Model()

# Collect the history of w-values and b-values to plot later
list_w, list_b = [], []
epochs = range(50)
losses = []
for epoch in epochs:
  list_w.append(model.w.numpy())
  list_b.append(model.b.numpy())
  current_loss = train(model, xs, ys, learning_rate=LEARNING_RATE)
  losses.append(current_loss)
  print('Epoch %2d: w=%1.2f b=%1.2f, loss=%2.5f' %
        (epoch, list_w[-1], list_b[-1], current_loss))

# Plot the w-values and b-values for each training Epoch against the true values
TRUE_w = 2.0
TRUE_b = -1.0
plt.plot(epochs, list_w, 'r', epochs, list_b, 'b')
plt.plot([TRUE_w] * len(epochs), 'r--', [TRUE_b] * len(epochs), 'b--')
plt.legend(['w', 'b', 'True w', 'True b'])
plt.show()

Neural Network

import tensorflow as tf
import numpy as np
from tensorflow import keras

# define a neural network with one neuron
# for more information on TF functions see: https://www.tensorflow.org/api_docs
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])

# use stochastic gradient descent for optimization and
# the mean squared error loss function
model.compile(optimizer='sgd', loss='mean_squared_error')

# define some training data (xs as inputs and ys as outputs)
xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

# fit the model to the data (aka train the model)
model.fit(xs, ys, epochs=500)

Epoch 500/500
1/1 [==============================] - 0s 5ms/step - loss: 2.5740e-05

print(model.predict([10.0]))

1/1 [==============================] - 0s 97ms/step
[[18.985199]]

Fundamentals of TinyML: The Machine Learning Paradigm

The Machine Learning Paradigm

Fitting lines

Gradient Descent: minimize the loss

Neural Network