Perceptrons
Basic Perceptron¶
This week's assignment is to code a Perceptron in Python and train it to learn the basic AND, OR, and XOR logic operations.
I created a Perceptron function with parameters that will let me study the operation of this algorithm.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def perceptron(X, y, n=100, learning_rate=0.01, activation=np.sign):
# pick the initial weights
weights = np.random.normal(scale=1/np.sqrt(X.shape[1]), size=X.shape[1])
# maintain a history of weights for plotting purposes
# note the `.copy()` is important!
weights_hist = [weights.copy()]
for _ in range(n):
# calculate the error
error = y - activation(X @ weights)
# update the weights based on the error
weights += learning_rate * error @ X
# add the new weights to weights history
weights_hist.append(weights.copy())
# if the error goes to zero it is time to stop
if not sum(abs(error)):
break
# prepare output
pred = activation(X @ weights)
sum_error = sum(abs(error))
weights_history = np.array(weights_hist)
return weights, pred, sum_error, weights_history
First I start with the AND function. In my X
and y
arrays, -1
represents False
and 1
represents True
.
# input values
X = np.array([[-1, -1, 1],
[-1, 1, 1],
[ 1, -1, 1],
[ 1, 1, 1]])
# output values
y = np.array([-1, -1, -1, 1])
Now I can run the Perceptron on my data.
np.random.seed(32)
weights, final_pred, sum_error, weights_history = perceptron(X, y)
I can see the model's predictions are exactly the same as the y
values. Therefore, the Perceptron was able to learn how to do the AND operation.
np.vstack([y, final_pred]).T
As expected, the total error is zero:
sum_error
The function returns its weights but they aren't that interesting to look at.
weights
A more interesting choice is to plot how the weights change as the Perceptron fits the data. We see that the weights move at a constant rate toward the final values.
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
If I use a larger learning rate the weights change faster.
np.random.seed(32)
weights, final_pred, sum_error, weights_history = perceptron(X, y, learning_rate=0.05)
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
If the learning rate is too large there can be noticeable oscillations in the weights. In this case it still works though.
np.random.seed(32)
weights, final_pred, sum_error, weights_history = perceptron(X, y, learning_rate=0.8)
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Next I will use the Perceptron on the OR operation. Again, -1
represents False
and 1
represents True
.
X = np.array([[-1, -1, 1],
[-1, 1, 1],
[ 1, -1, 1],
[ 1, 1, 1]])
y = np.array([-1, 1, 1, 1])
np.random.seed(37)
weights, final_pred, sum_error, weights_history = perceptron(X, y)
The Perceptron performed admirably, correctly learning the proper output for each input.
np.vstack([y, final_pred]).T
The total error is zero.
sum_error
The plot of the weights evolving over time is similar to the AND function.
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Finally, the XOR function.
X = np.array([[-1, -1, 1],
[-1, 1, 1],
[ 1, -1, 1],
[ 1, 1, 1]])
y = np.array([-1, 1, 1, -1])
np.random.seed(42)
weights, final_pred, sum_error, weights_history = perceptron(X, y)
As expected, this did not work. The Perceptron incorrectly predicts two of the output values.
np.vstack([y, final_pred]).T
The predictions have errors.
sum_error
The Perceptron tries to find an acceptable set of weights but fails to converge on weights that work. Instead, it oscillates around weights of zero until it hits the iteration limit.
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Sigmoid Activation Function¶
Let's make this assignment more interesting by changing the activation function from the sign function to the sigmoid function. The sigmoid function is commonly used in Neural Networks to better handle complex data.
The sigmoid function looks like this:
def sigmoid(x):
return 1/(1+np.exp(-x))
x = np.linspace(-5, 5)
plt.figure(figsize=(8, 5))
plt.plot(x, sigmoid(x), c='r');
Observe that the function output is always between 0
and 1
. As the function input approaches positive or negative infinity it moves closer to 0
or 1
.
For this to work I must represent False
as 0
instead of -1
. Below I show the data inputs for the AND function.
X = np.array([[0, 0, 1],
[0, 1, 1],
[1, 0, 1],
[1, 1, 1]])
y = np.array([0, 0, 0, 1])
For this to work correctly I must also increase the learning rate and the maximum number of iterations.
np.random.seed(22)
weights, final_pred, sum_error, weights_history = perceptron(X, y,
learning_rate=0.1,
n=1000,
activation=sigmoid)
The algorithm is still able to learn the AND function but cannot ever return 0
or 1
because the sigmoid function only returns 0
or 1
at postive or negative infinity.
np.vstack([y, final_pred]).T
Therefore for this problem the error cannot ever be zero.
sum_error
The plot shows nice smooth curves as the weights evolve to fit the data.
plt.figure(figsize=(8, 5))
lines = plt.plot(weights_history)
plt.legend(lines, [f'weight {i}' for i in range(3)]);
Comments