Skip to main content

Posts about python

Tuning Hyperparameters

Our last Learning Machines assignment is to calibrate the hyperparameters for a Multilayer Perceptron. Patrick gave us a working model using the MNIST database of handwritten digits. The model uses a Restricted Boltzmann Machine to reduce the dimensionality of the data and then a Multilayer Perceptron to classify the digits.

I was able to achieve an out-of-sample accuracy of almost 96%. This is in line with the results of other researchers.

Read more…

Multi-Layer Perceptron Study

Our next assignment is to use a Multi-Layer Perceptron to study a dataset.

The dataset I selected is the commonly studied Poker Hand data. Each record contains data for 5 playing cards and a poker hand classification, such as full house or straight.

This dataset proved to be difficult to work with. It is an example of an imbalanced dataset in that the more common poker hands like two-of-a-kind are heavily represented and the less common hands like straight and flush are not.

I found that the Perceptron was able to correctly classify some poker hands very well while performing terribly for others. I suspect a very different training methodology is required to properly train a Perceptron with this dataset.

Read more…

Modified Pulse Sensing Algorithm

Our Physical Computing final project depends on a Pulse Sensor to detect a user's heartbeat. The people at World Famous Electronics created an Arduino library for their customers to use with their sensor. The library adds a lot of value because it provides users with a well researched algorithm for using the sensor to properly detect a heartbeat. Pulse Sensor users don't have to re-invent the wheel and code their own algorithms. Writing your own algorithm to do this is difficult, and the one provided by the company is better than the one that I came up with for our midterm.

Still, the provided algorithm isn't perfect. For some people it seems to miss some heartbeats and add extra heartbeats. A fellow ITP student, Ellen, showed me that it would have odd spikes in the beats-per-minute (BPM) value. It wasn't clear why this was happening. Since I previously had been analyzing the sensor's data in Python, I came up with a plan to figure out why the Arduino code was doing this and to figure out if there was anything I could do about it. After studying the data and making some plots, I was able to make some improvements the algorithm. It still isn't perfect but my changes address many of the weaknesses of the algorithm.

The original Pulse Sensor Arduino code is available online on GitHub. I am sharing this code with my fellow students who are also using the same sensor. After our projects are complete I will submit my modified code to GitHub as a pull request to share with the rest of the community.

Read more…

Perceptrons

Basic Perceptron

This week's assignment is to code a Perceptron in Python and train it to learn the basic AND, OR, and XOR logic operations.

I created a Perceptron function with parameters that will let me study the operation of this algorithm.

Read more…

Clustering and NumPy

K-means clustering

Our second assignment in our Learning Machines class is to implement k-means clustering in Python. I've implemented this in other programming languages but not in Python. Normally I'd use scikit-learn for this but it is a worthwhile exercise to think through how to do this in Python.

Read more…

Run Length Encoding

Our first assignment in our Learning Machines class is to implement a run length encoder and decoder. This is a simple data compression algorithm that benefits from repeated patterns.

It happens that I previously had an idea for an Arduino project that requires a light-weight data decompression algorithm to decode audio data. I was going to use run length encoding because it is simple to implement and the code itself won't take up much of the Arduino's precious memory. I'll also need to encode the audio files in Python, and I'll use the below code to do it.

Read more…

Heartbeat Detection Study

Purpose of detecting heartbeat data

Our Midi Meditation project is a physical computing device that will repeatedly play a single note in sync with the user's heartbeat. Fundamental to this is the ability to reliably detect when a user's heart is beating.

We want our device to work effectively for most or all people. This means it should play one note in sync with the user's pulse without extra notes between beats.

We had a pulse sensor suitable for an Arduino to use for this project. One approach for prototyping this is to code a heartbeat detection algorithm on an Arduino after viewing the sensor readings on the Serial monitor for a couple of people. This approach could work but would require a lot of parameter tweaking to get it "just right" with repeated user testing between parameter adjustments.

Read more…

First Jupyter Notebook Post

This is a blog post created in Jupyter notebook.

The goal is to see how well this feature works. I'd like to be able to post Python code to my blog. Happily, Nikola supports that seamlessly.

Normally Nikola preserves the width of each notebook cell. It makes sense that it does this but that doesn't work so well with this template because of the navigation bar on the left side of the screen. That's OK, I can override it by changing the notebook styling with this if I need to:

#notebook-container {
  width: 800px;
}

And here is some Python code:

In [1]:
def square(x):
    return x**2

for i in range(10):
    print(square(i))
0
1
4
9
16
25
36
49
64
81

And a plot:

In [2]:
%matplotlib inline
import matplotlib

import pandas as pd
import pandas.util.testing as pd_testing
In [3]:
df = pd_testing.makeTimeDataFrame(20)
df.index = pd.date_range(start=pd.Timestamp.now().floor('D'), periods=df.shape[0])

df.plot(figsize=(10, 5))
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f57609a2fd0>

Magnificent!

JupyterDay NYC

Yesterday I had the pleasure of attending the first JupyterDay Conference in NYC. This was a one day event discussing the open source project Jupyter, formerly known as IPython Notebook.

I had a wonderful time at the event. All of the speakers were engaging and I got a lot of great ideas for what I want to learn about to strengthen my technology and data science skills.

I took extensive notes and can't compile them all here. Instead, here are a few highlights from the event:

  • Jeremy Singer-Vine, BuzzFeed - Jeremy is a Data Editor at BuzzFeed, and does data investigative journalism. BuzzFeed does quantitative analysis for some of their news stories and will back up their news stories with research posted on github that readers can verify. For example, this news story and this notebook. I wish more journalists were this transparent.
  • Doug Blank, Bryn Mawr - Doug talked about how Jupyter is changing education at his college. Everything is a notebook there. Students submit notebooks for their homework assignments. They've built many extensions to Jupyter to support this. The most fascinating is they have kernels for many other languages like BASIC, Assembly, and Pascal. I am going to set these up on my computer very soon.
  • Sylvain Corlay, Bloomberg - Sylvain is a quant at Bloomberg. He showed us a demo of a new plotting library called bqplot they will share with the community. He employed ipython widgets to interact with the charts. And the widget that got a round of applause from the audience? An ipython gamepad widget. I didn't even know that was possible! Glad I have a gamepad already. Can't wait to put that to use analyzing data!

These were just a few of yesterday's speakers. The attendees were supportive and bright as well. I had many thought provoking conversations about data analysis and now have a list of tools I want to learn about as soon as I can.

All in all, a great day. Very glad I signed up for this.

Presentation at MSFT Research Labs

This week I created a presentation for my research paper on Algorithmic Trading in the Iowa Electronic Markets. I shared it with researchers studying prediction markets at Microsoft Research Labs.

The people there were very smart and interested in what I had to say. They may very well have the largest collection of people studying prediction markets anywhere in the world. It's a somewhat obscure field, as most researchers are interested in either theoretical market models or more developed financial markets. It's a shame because there is a lot to learn from prediction markets, which sit right between the two.

The presentation itself is made with the reveal.js presentation framework. I made it in Jupyter, which now has the ability to output working presentations in reveal. I had a lot of fun learning about Jupyter and building a presentation in a notebook. The presentation workflow was so much better than anything I have experienced before, and I can't image ever using anything else again.