Multi-Layer Perceptron Study

Our next assignment is to use a Multi-Layer Perceptron to study a dataset.

The dataset I selected is the commonly studied Poker Hand data. Each record contains data for 5 playing cards and a poker hand classification, such as full house or straight.

This dataset proved to be difficult to work with. It is an example of an imbalanced dataset in that the more common poker hands like two-of-a-kind are heavily represented and the less common hands like straight and flush are not.

I found that the Perceptron was able to correctly classify some poker hands very well while performing terribly for others. I suspect a very different training methodology is required to properly train a Perceptron with this dataset.

I start by importing Python packages:

In [1]:
from functools import partial
import pickle
import re

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame, Series

%matplotlib notebook

import mlp

The training and test data is provided in CSV files. The training data has 25K rows and the test data has 1M.

In [2]:
train_poker_hands = pd.read_csv('data/poker-hand-training.csv',
                                header=None,
                                names="S1,C1,S2,C2,S3,C3,S4,C4,S5,C5,CLASS".split(','))
test_poker_hands = pd.read_csv('data/poker-hand-testing.csv',
                                header=None,
                                names="S1,C1,S2,C2,S3,C3,S4,C4,S5,C5,CLASS".split(','))

print(train_poker_hands.shape)
print(test_poker_hands.shape)
(25010, 11)
(1000000, 11)

The data in both files contains an ordinal representing the suit and rank of each card. It also contains a classification of the hand type, as shown below.

0: Nothing in hand; not a recognized poker hand 
1: One pair; one pair of equal ranks within five cards 
2: Two pairs; two pairs of equal ranks within five cards 
3: Three of a kind; three equal ranks within five cards 
4: Straight; five cards, sequentially ranked with no gaps 
5: Flush; five cards with the same suit 
6: Full house; pair + different rank three of a kind 
7: Four of a kind; four equal ranks within five cards 
8: Straight flush; straight + flush 
9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush 

The data looks like this:

In [3]:
train_poker_hands.head()
Out[3]:
S1 C1 S2 C2 S3 C3 S4 C4 S5 C5 CLASS
0 1 10 1 11 1 13 1 12 1 1 9
1 2 11 2 13 2 10 2 12 2 1 9
2 3 12 3 11 3 13 3 10 3 1 9
3 4 10 4 11 4 1 4 13 4 12 9
4 4 1 4 13 4 12 4 11 4 10 9

The ordinals are meaningless and should be encoded using one-hot-encoding. I would prefer to use scikit-learn to do this but for this class I did it myself with some simple code.

An important feature of this data is the card ordering. The same exact hand can be represented multiple times with the cards shuffled between slots 1-5. I could have encoded this differently using one-hot-encoding and 52 columns, one for each card. Since there are no duplicates this is a valid encoding. I elected to not do this because I wanted to preserve this aspect of the data.

Notice in the code below I also shuffle the data. This is important for the batch optimization.

In [4]:
def encode_poker_hand_data(poker_hands):
    # encode data using one hot encoding
    
    # inner function to expand ordinals into a series of 0's and one 1
    def encode(x, n):
        out = [0] * n
        out[x] = 1

        return out

    # dataset uses numbers 1-4 and 1-13 identify cards. Change to start at zero
    poker_hands.iloc[:, :-1] -= 1

    # encode suits (0-3)
    temp = poker_hands['S1,S2,S3,S4,S5'.split(',')].applymap(partial(encode, n=4)).itertuples(index=False)
    encoded_suits = DataFrame([[x for g in r for x in g] for r in temp])

    # encode ranks (0-12)
    temp = poker_hands['C1,C2,C3,C4,C5'.split(',')].applymap(partial(encode, n=13)).itertuples(index=False)
    encoded_ranks = DataFrame([[x for g in r for x in g] for r in temp])

    # put them all together. order is irrelevant
    encoded_data = pd.concat([encoded_suits, encoded_ranks], axis=1)
    encoded_data.columns = range(encoded_data.shape[1])

    # encode hand classifications (0-9)
    encoded_classifications = DataFrame([x for x in poker_hands['CLASS'].apply(partial(encode, n=10))])
    
    # shuffle data
    random_index = np.random.permutation(encoded_data.shape[0])
    encoded_data = encoded_data.iloc[random_index].reset_index(drop=True)
    encoded_classifications = encoded_classifications.iloc[random_index].reset_index(drop=True)
    
    return encoded_data, encoded_classifications

np.random.seed(42)

train_X, train_Y = encode_poker_hand_data(train_poker_hands)
test_X, test_Y = encode_poker_hand_data(test_poker_hands)

The encoded hand data has (4 + 13) * 5 = 85 columns of 1's and 0's. It looks like this:

In [5]:
train_X.head()
Out[5]:
0 1 2 3 4 5 6 7 8 9 ... 75 76 77 78 79 80 81 82 83 84
0 0 0 1 0 1 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
1 0 1 0 0 0 1 0 0 1 0 ... 0 0 0 0 0 0 1 0 0 0
2 0 0 0 1 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 1 0 0 1 ... 1 0 0 0 0 0 0 0 0 0
4 0 1 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 1 0 0

5 rows × 85 columns

And the hand classifications:

In [6]:
train_Y.head()
Out[6]:
0 1 2 3 4 5 6 7 8 9
0 1 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0
4 0 1 0 0 0 0 0 0 0 0

Next I can set the hyperparameters for the model optimization. I will iterate 75K times in batches of 1000.

I set the first hidden layer to have a dimension of 52 in the hopes that the model learns to identify the same card in different slots.

In [7]:
# Set hyperparameters:
sample_size = train_X.shape[1]
output_size = train_Y.shape[1]
batch_size = 1000
epoch_cnt = 75000
report_freq = 15000
learn_rate = 0.05

# Construct MLP:
model = mlp.Mlp([sample_size, 52, 25, output_size], "sigmod")

I can run the training process and will save the trained model in a pickle file. The training process takes many hours to complete. Saving the model allows me to save the model and continue later after restarting my computer.

In [8]:
model.train(train_X.T.values, train_Y.T.values, learn_rate,
            epoch_cnt, batch_size, report_freq)

with open('model.p', 'wb') as f:
    pickle.dump(model, f)
Epoch: 0
Error: 0.43836257761408215

Epoch: 15000
Error: 0.11310313336942192

Epoch: 30000
Error: 0.02897885358140075

Epoch: 45000
Error: 0.01729469675361829

Epoch: 60000
Error: 0.015170719539740422

Epoch: 75000
Error: 0.01420257039062697

The in-sample error started out high but after 75K iterations it was reduced considerably.

Reloading the model from the pickle file is simple.

In [9]:
with open('model.p', 'rb') as f:
    model = pickle.load(f)

I can now use the model to make predictions in my test dataset.

In [10]:
test_predictions = DataFrame(model.predict(test_X.T).T)

The predictions are probabilities between 0 and 1.

In [11]:
test_predictions.head()
Out[11]:
0 1 2 3 4 5 6 7 8 9
0 9.980287e-01 0.000299 1.750294e-08 8.229227e-10 0.002947 0.001297 0.001124 0.000654 0.000528 0.000551
1 5.770312e-04 0.995765 1.206981e-03 8.292053e-06 0.003687 0.001796 0.005324 0.000922 0.000792 0.000779
2 1.290287e-13 1.000000 9.999355e-01 8.510249e-01 0.004679 0.003108 0.046167 0.001397 0.001305 0.001257
3 9.868559e-01 0.002221 1.174163e-07 2.822749e-09 0.003153 0.001404 0.001433 0.000700 0.000559 0.000581
4 9.979368e-01 0.000413 2.420286e-08 8.004877e-10 0.003035 0.001254 0.001102 0.000641 0.000535 0.000515

The best way to get a prediction out of this is to find the column with the highest probability. I can do that with the idxmax function. I want to measure the accuracy of the predictions by comparing them with the true classifications.

In [12]:
predictions = test_predictions.idxmax(axis=1)
# this just un-does the one-hot encoding...
true_classification = test_Y.idxmax(axis=1)

I see that the model is over 92% accurate. This doesn't seem terrible but I will dig deeper.

In [13]:
def accuracy(predictions, true_classification):
    return (predictions == true_classification).sum() / predictions.shape[0]

accuracy(predictions, true_classification)
Out[13]:
0.92339400000000005

The shortcomings are apparent when we look at how well it did by category. Did it learn some poker hands and not others? Measuring the accuracy conditional on the poker hand classifications shows that this is in fact the case.

In [14]:
hands_str = """0: Nothing in hand; not a recognized poker hand
1: One pair; one pair of equal ranks within five cards
2: Two pairs; two pairs of equal ranks within five cards
3: Three of a kind; three equal ranks within five cards
4: Straight; five cards, sequentially ranked with no gaps
5: Flush; five cards with the same suit
6: Full house; pair + different rank three of a kind
7: Four of a kind; four equal ranks within five cards
8: Straight flush; straight + flush
9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush"""

hands = {int(a): b.strip() for a, b, _ in [re.split(r"[:;]", s, maxsplit=2) for s in hands_str.split('\n')]}

def accuracy_by_classification(predictions, true_classification, classification):
    indx = true_classification == classification

    return accuracy(predictions[indx], true_classification[indx])

Series({hands[i]: accuracy_by_classification(predictions, true_classification, i) for i in range(10)})
Out[14]:
Flush              0.000000
Four of a kind     0.000000
Full house         0.000000
Nothing in hand    0.999870
One pair           0.999413
Royal flush        0.000000
Straight           0.000000
Straight flush     0.000000
Three of a kind    0.000000
Two pairs          0.000000
dtype: float64

The model can identify One pair almost perfectly. It is usually correct when the hand is worthless but since that is half the test dataset and also the default assumption this accomplishment isn't as meaningful. It completely fails for all other hands.

In the test dataset about 42% of the records are One pair and about 50% are Nothing in hand.

Out of curiosity, how well does the model do for the training dataset?

In [15]:
train_predictions = DataFrame(model.predict(train_X.T).T)

predictions_train = train_predictions.idxmax(axis=1)
true_classification_train = train_Y.idxmax(axis=1)

accuracy(predictions_train, true_classification_train)
Out[15]:
0.92331067572970815

92.5%. And conditional on classification?

In [16]:
Series({hands[i]: accuracy_by_classification(predictions_train, true_classification_train, i) for i in range(10)})
Out[16]:
Flush              0.0
Four of a kind     0.0
Full house         0.0
Nothing in hand    1.0
One pair           1.0
Royal flush        0.0
Straight           0.0
Straight flush     0.0
Three of a kind    0.0
Two pairs          0.0
dtype: float64

The training dataset also learned to identify One pair.

There could be a number of reasons for this poorly performing model. I believe it is stuck in a local minimum that identifies Pairs and nothing else. Perhaps I need to train it for more iterations. Or perhaps the model needs more layers to correctly incorporate more complex logic.

Out of curiosity, what are the model's second choice for each hand? Are those correct?

I can find out by repeating the above code after setting the probabilities of the model's first choices to zero. Then the idxmax function will show me the model's second choices.

In [17]:
test_predictions2 = test_predictions.copy()

def set_max_to_zero(row):
    row = row.copy()
    row[row.argmax()] = 0
    return row

test_predictions2 = test_predictions2.apply(set_max_to_zero, axis=1)

predictions_2nd_choice = test_predictions2.idxmax(axis=1)

I see that the second choices are OK. It can identify Two pairs.

In [18]:
Series({hands[i]: accuracy_by_classification(predictions_2nd_choice, true_classification, i) for i in range(10)})
Out[18]:
Flush              0.000000
Four of a kind     0.000000
Full house         0.000000
Nothing in hand    0.000130
One pair           0.000587
Royal flush        0.000000
Straight           0.095753
Straight flush     0.000000
Three of a kind    0.000000
Two pairs          0.986141
dtype: float64

This suggests to me that there is some hope in the current model.

When I fit this model over the weekend I used the "tanh" activation function with 50K iterations. That model was able to identify some Straights (first choice) in the test dataset. Its second choices correctly identified Two pairs and Three of a kind perfectly. It also correctly identified two out of three of the Royal flush records in the dataset.

The code I am using is provided to me by our instructor, Patrick. I intend to recode this in TensorFlow because it will then use my GPU to perform the calculations much faster. I should also be fiddling with the learning rate and other parameters to understand how that impacts the final result.

This dataset may benefit from an optimization process that weights prediction errors by the inverse proportion of the record classification's frequency. This would force the model to consider the rare poker hands at the beginning instead of making the quickest gains fitting the model for One pair. I suspect that the model structures itself to identify pairs but can't move away from that to identify other hands as well. Essentially, it gets stuck in a large local minimum.

There must also be research papers or other materials using this dataset doing exactly what I am doing now. They are worth reading and learning from.

I think there is a lot to be gained by extensively modeling a single dataset with many models and many hyperparameter settings. I can gain intuition into what the parameters mean and how they affect model performance and speed. I intend to do this after the conclusion of this class when I have more time.

Comments