Perceptrons

Basic Perceptron

This week's assignment is to code a Perceptron in Python and train it to learn the basic AND, OR, and XOR logic operations.

I created a Perceptron function with parameters that will let me study the operation of this algorithm.

In [1]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

def perceptron(X, y, n=100, learning_rate=0.01, activation=np.sign):
    # pick the initial weights
    weights = np.random.normal(scale=1/np.sqrt(X.shape[1]), size=X.shape[1])

    # maintain a history of weights for plotting purposes
    # note the `.copy()` is important!
    weights_hist = [weights.copy()]

    for _ in range(n):
        # calculate the error
        error = y - activation(X @ weights)
        # update the weights based on the error
        weights += learning_rate * error @ X
        # add the new weights to weights history
        weights_hist.append(weights.copy())

        # if the error goes to zero it is time to stop
        if not sum(abs(error)):
            break

    # prepare output
    pred = activation(X @ weights)
    sum_error = sum(abs(error))
    weights_history = np.array(weights_hist)

    return weights, pred, sum_error, weights_history

First I start with the AND function. In my X and y arrays, -1 represents False and 1 represents True.

In [2]:
# input values
X = np.array([[-1, -1, 1],
              [-1,  1, 1],
              [ 1, -1, 1],
              [ 1,  1, 1]])
# output values
y = np.array([-1, -1, -1, 1])

Now I can run the Perceptron on my data.

In [3]:
np.random.seed(32)

weights, final_pred, sum_error, weights_history = perceptron(X, y)

I can see the model's predictions are exactly the same as the y values. Therefore, the Perceptron was able to learn how to do the AND operation.

In [4]:
np.vstack([y, final_pred]).T
Out[4]:
array([[-1., -1.],
       [-1., -1.],
       [-1., -1.],
       [ 1.,  1.]])

As expected, the total error is zero:

In [5]:
sum_error
Out[5]:
0.0

The function returns its weights but they aren't that interesting to look at.

In [6]:
weights
Out[6]:
array([ 0.17856569,  0.18794144, -0.04460405])

A more interesting choice is to plot how the weights change as the Perceptron fits the data. We see that the weights move at a constant rate toward the final values.

In [7]:
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Plot showing how the perceptron's 3 weights changed during the model training process. The x axis is training steps numbered from 0 to 20 and the y axis is weight values from -0.2 to 0.6. During training weight 0 increases from -0.2 to 0.2, weight 1 decreases from 0.6 to 0.2, and weight 2 decreases from 0.35 to -0.05.

If I use a larger learning rate the weights change faster.

In [8]:
np.random.seed(32)

weights, final_pred, sum_error, weights_history = perceptron(X, y, learning_rate=0.05)

plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Plot showing how the perceptron's 3 weights changed during the model training process. The x axis is training steps numbered from 0 to 5 and the y axis is weight values from -0.2 to 0.6. During training weight 0 increases from -0.2 to 0.2, weight 1 decreases from 0.6 to 0.2, and weight 2 decreases from 0.35 to -0.05.

If the learning rate is too large there can be noticeable oscillations in the weights. In this case it still works though.

In [9]:
np.random.seed(32)

weights, final_pred, sum_error, weights_history = perceptron(X, y, learning_rate=0.8)

plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Plot showing how the perceptron's 3 weights changed during the model training process. The x axis is training steps numbered from 0 to 3 and the y axis is weight values from -0.2 to 0.6. During training weight 0 increases from -0.2 to 1.2, weight 1 decreases from 0.6 to -1 and increases to 2.2, and weight 2 decreases from 0.35 to -1.2.

Next I will use the Perceptron on the OR operation. Again, -1 represents False and 1 represents True.

In [10]:
X = np.array([[-1, -1, 1],
              [-1,  1, 1],
              [ 1, -1, 1],
              [ 1,  1, 1]])
y = np.array([-1, 1, 1, 1])

np.random.seed(37)

weights, final_pred, sum_error, weights_history = perceptron(X, y)

The Perceptron performed admirably, correctly learning the proper output for each input.

In [11]:
np.vstack([y, final_pred]).T
Out[11]:
array([[-1., -1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

The total error is zero.

In [12]:
sum_error
Out[12]:
0.0

The plot of the weights evolving over time is similar to the AND function.

In [13]:
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Plot showing how the perceptron's 3 weights changed during the model training process. The x axis is training steps numbered from 0 to 5 and the y axis is weight values from -0.1 to 0.4. During training weight 0 increases from -0.2 to 0.3, weight 1 decreases from 0.4 to 0.3, and weight 2 increases from 0.2 to 0.3.

Finally, the XOR function.

In [14]:
X = np.array([[-1, -1, 1],
              [-1,  1, 1],
              [ 1, -1, 1],
              [ 1,  1, 1]])
y = np.array([-1, 1, 1, -1])

np.random.seed(42)

weights, final_pred, sum_error, weights_history = perceptron(X, y)

As expected, this did not work. The Perceptron incorrectly predicts two of the output values.

In [15]:
np.vstack([y, final_pred]).T
Out[15]:
array([[-1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [-1.,  1.]])

The predictions have errors.

In [16]:
sum_error
Out[16]:
4.0

The Perceptron tries to find an acceptable set of weights but fails to converge on weights that work. Instead, it oscillates around weights of zero until it hits the iteration limit.

In [17]:
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Plot showing how the perceptron's 3 weights changed during the model training process. The x axis is training steps numbered from 0 to 100 and the y axis is weight values from -0.1 to 0.4. During training weight 0 decreases from 0.3 to 0.0, weight 1 increases from -0.1 to 0, and weight 2 decreases from 0.4 to about zero, but oscillates around zero instead of remaining constant.

Sigmoid Activation Function

Let's make this assignment more interesting by changing the activation function from the sign function to the sigmoid function. The sigmoid function is commonly used in Neural Networks to better handle complex data.

The sigmoid function looks like this:

In [18]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

x = np.linspace(-5, 5)

plt.figure(figsize=(8, 5))
plt.plot(x, sigmoid(x), c='r');
Plot showing the sigmoid activation function. The x axis is numbered from -6 to 6 and the y axis is from 0 to 1. The line starts at 0 for the large negative numbers and increases faster and faster till it reaches 0 on the x axis, then the rate of increase decreases and decreases to no change as the line approaches 1 for large positive numbers.

Observe that the function output is always between 0 and 1. As the function input approaches positive or negative infinity it moves closer to 0 or 1.

For this to work I must represent False as 0 instead of -1. Below I show the data inputs for the AND function.

In [19]:
X = np.array([[0, 0, 1],
              [0, 1, 1],
              [1, 0, 1],
              [1, 1, 1]])
y = np.array([0, 0, 0, 1])

For this to work correctly I must also increase the learning rate and the maximum number of iterations.

In [20]:
np.random.seed(22)

weights, final_pred, sum_error, weights_history = perceptron(X, y,
                                                             learning_rate=0.1,
                                                             n=1000,
                                                             activation=sigmoid)

The algorithm is still able to learn the AND function but cannot ever return 0 or 1 because the sigmoid function only returns 0 or 1 at postive or negative infinity.

In [21]:
np.vstack([y, final_pred]).T
Out[21]:
array([[  0.00000000e+00,   2.03864476e-04],
       [  0.00000000e+00,   4.96186064e-02],
       [  0.00000000e+00,   4.96199381e-02],
       [  1.00000000e+00,   9.30402233e-01]])

Therefore for this problem the error cannot ever be zero.

In [22]:
sum_error
Out[22]:
0.16919953942730676

The plot shows nice smooth curves as the weights evolve to fit the data.

In [23]:
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Plot showing how the perceptron's 3 weights changed during the model training process. The x axis is training steps numbered from 0 to 1000 and the y axis is weight values from -8 to 6. During training weight 0 increases from 0 to 5, weight 1 increases from 0 to 5, and weight 2 decreases from 0.5 to -8.

Comments