Skip to main content

Posts about python

Data Assembly Complete

Milestone #2: Data Assembly

I'm comfortable saying I've completed this milestone. I've finished all the major features and have a nice interface for interacting with the downloaded data.

There are a few minor issues but none require a lot of time or brainpower to implement. Mostly nice-to-have enhancements like better error checking in my code that I feel compelled to do but aren't critical right now. I'll complete them as time allows.

The important thing is that I can now begin downloading lots of data without fear that I will need to download everything a second time later.

I made an interactive tool in matplotlib to visualize a spatial map of the locations I've downloaded data for. It looks like this:

Read more…

Data Assembly

Milestone #2: Data Assembly

The second step of this project is to access the Google Street View data and organize it in a suitable format. In my project plan my target was to reach this goal by February 21st (last Wednesday). Although I have accomplished a lot, I have not achieved all of the things I wanted to achieve for this milestone. I expect to hit it by next week at the latest.

Here's what I have achieved.

First, I can download all of the relevant data from Google. This includes all of the panorama image data and meta data. I can also access the panorama ids for the neighboring locations. All of the metadata is stored in a database.

Read more…

Data Investigation

Milestone #1: Data Investigation

My first step is to investigate my data options for this project. As discussed in my plan, I am considering Google Streetview data and LiDAR data. The Streetview data is my first choice but I realized that that data might be different from what I expect or have weird complications that make it difficult or impossible to do what I have in mind. I wanted to consider alternatives, and there's a lot that interests me about LiDAR data. Of course that data might be impossible to work with too. In any case, I needed to find out these things right away while it is still easy to change course on this project.

The summary of Google Streetview data is that it is easy to work with and close to what I expected. They provide a convenient API that is properly documented. Unfortunately, the depth data discussed in this blog post does not come from the API, and that information is compressed in a format I have not yet parsed. The author of that post does provide C++ code for doing so; I am optimistic that I will be able to translate that to Python and/or integrate their process into my code.

LiDAR data is also well documented but extremely complex. I've worked with complex data before and am confident I can manage this if I put in the time. My objection is that taking the project in that direction would take a good portion of the class. I would have less time to learn about the topics I want to be learning about.

Additionally, I feel the challenges I would face with the Google Streetview data is resonating with me in a way that the LiDAR data challenges are not.

My conclusion is that I will use the Google Streetview data for this project. Sometime after the semester is over I might spend more time with the LiDAR data and get some experience working with it. It would be a great choice for a future project.

Read more…

Tuning Hyperparameters

Our last Learning Machines assignment is to calibrate the hyperparameters for a Multilayer Perceptron. Patrick gave us a working model using the MNIST database of handwritten digits. The model uses a Restricted Boltzmann Machine to reduce the dimensionality of the data and then a Multilayer Perceptron to classify the digits.

I was able to achieve an out-of-sample accuracy of almost 96%. This is in line with the results of other researchers.

Read more…

Multi-Layer Perceptron Study

Our next assignment is to use a Multi-Layer Perceptron to study a dataset.

The dataset I selected is the commonly studied Poker Hand data. Each record contains data for 5 playing cards and a poker hand classification, such as full house or straight.

This dataset proved to be difficult to work with. It is an example of an imbalanced dataset in that the more common poker hands like two-of-a-kind are heavily represented and the less common hands like straight and flush are not.

I found that the Perceptron was able to correctly classify some poker hands very well while performing terribly for others. I suspect a very different training methodology is required to properly train a Perceptron with this dataset.

Read more…

Modified Pulse Sensing Algorithm

Our Physical Computing final project depends on a Pulse Sensor to detect a user's heartbeat. The people at World Famous Electronics created an Arduino library for their customers to use with their sensor. The library adds a lot of value because it provides users with a well researched algorithm for using the sensor to properly detect a heartbeat. Pulse Sensor users don't have to re-invent the wheel and code their own algorithms. Writing your own algorithm to do this is difficult, and the one provided by the company is better than the one that I came up with for our midterm.

Still, the provided algorithm isn't perfect. For some people it seems to miss some heartbeats and add extra heartbeats. A fellow ITP student, Ellen, showed me that it would have odd spikes in the beats-per-minute (BPM) value. It wasn't clear why this was happening. Since I previously had been analyzing the sensor's data in Python, I came up with a plan to figure out why the Arduino code was doing this and to figure out if there was anything I could do about it. After studying the data and making some plots, I was able to make some improvements the algorithm. It still isn't perfect but my changes address many of the weaknesses of the algorithm.

The original Pulse Sensor Arduino code is available online on GitHub. I am sharing this code with my fellow students who are also using the same sensor. After our projects are complete I will submit my modified code to GitHub as a pull request to share with the rest of the community.

Read more…


Basic Perceptron

This week's assignment is to code a Perceptron in Python and train it to learn the basic AND, OR, and XOR logic operations.

I created a Perceptron function with parameters that will let me study the operation of this algorithm.

Read more…

Clustering and NumPy

K-means clustering

Our second assignment in our Learning Machines class is to implement k-means clustering in Python. I've implemented this in other programming languages but not in Python. Normally I'd use scikit-learn for this but it is a worthwhile exercise to think through how to do this in Python.

Read more…

Run Length Encoding

Our first assignment in our Learning Machines class is to implement a run length encoder and decoder. This is a simple data compression algorithm that benefits from repeated patterns.

It happens that I previously had an idea for an Arduino project that requires a light-weight data decompression algorithm to decode audio data. I was going to use run length encoding because it is simple to implement and the code itself won't take up much of the Arduino's precious memory. I'll also need to encode the audio files in Python, and I'll use the below code to do it.

Read more…

Heartbeat Detection Algorithm

Purpose of detecting heartbeat data

Our Midi Meditation project is a physical computing device that will repeatedly play a single note in sync with the user's heartbeat. Fundamental to this is the ability to reliably detect when a user's heart is beating.

We want our device to work effectively for most or all people. This means it should play one note in sync with the user's pulse without extra notes between beats.

We had a pulse sensor suitable for an Arduino to use for this project. One approach for prototyping this is to code a heartbeat detection algorithm on an Arduino after viewing the sensor readings on the Serial monitor for a couple of people. This approach could work but would require a lot of parameter tweaking to get it "just right" with repeated user testing between parameter adjustments.

Read more…