A few interesting blog posts of note are:
- These Bored Apes Do Not Exist,
- Fetching Better Beer Recommendations with Collie,
- Better Pandas DataFrame Visualization (with a Taco Bell Example), and
- Rate My Professor: is it always bad?
I’ve done two “Covid” talks (AKA all remote conferences hosted on Zoom, with strange internet connections and all).
- Playing Deal or No Deal Better Than a Human Talk, and
- Detecting, Recognizing, and Analyzing Animated Characters Talk
Interesting Projects of Note
If you’re still reading this page, that likely means you’re really interested in hearing more about some of the projects I’ve worked on. I’ll briefly detail them and link to relevant websites / repos below.
A scalable and flexible deep learning recommendations library I started work on during my ShopRunner internship and continued in my second year as a data scientist. As of today, this is the recommendations algorithm used at ShopRunner for product-product, member-product, and mass appeal recommendations. While Collie and recommendations are something I can talk about for hours, I’ll spare you the words and instead link to a series of introductory blog posts here, the repo here, docs here, and the project on PyPI here.
I want to note that this library was not solely built by me! I was the primary contributor to and creator of Collie, but important contributions were made by teammates Hanna Torrence (who wrote the initial logic for the implicit metrics calculations and partial credit loss) and Nicole Carlson (who helped speed up the data loading logic and reviewed the repo for open-sourcing, catching tons of potential issues), as well as a handful more ShopRunner teammates. Without them, this library would not exist!
Initially built for a hackweek at ShopRunner, this is my homemade attempt at Siamese networks using hard negative example mining via an Annoy index. It’s rough in its current implementation, but shows potential for unsupervised image clustering and classification. The repo can be viewed here.
A project inspired by the work researchers did at the Geena Davis Institute, which was eventually used by the Institute! This is a lightweight parser for film and TV scripts to separate out the characters and their lines, and generate some simple, baseline stats on the script for analysis. There’s no machine learning here, but the code was a neat foray into some simple software engineering / repo management. The repo can be viewed here.
Deal or No Deal Bot
My first successful project involving reinforcement learning, this time on the incredibly popular T.V. show, Deal or No Deal. I actually did an entire talk on this project, which you can view here. The repo can also be viewed here.
Remember when GPT-2 was first released and everyone was freaking out with excitement trying to find any project to use it on? Well, this project was that for me. I collected and cleaned every body of work that William Shakespeare wrote, fed it into an LSTM as a baseline, then a fine-tuned GPT-2 for comparison. As I inspected output from the GPT-2, I had to look up several passages to ensure that Shakespeare didn’t write it, but the model did; the results are seriously that good – props to OpenAI. The repo can be viewed here.
My first try at getting a reinforcement agent to play a game I love. It did not go as smoothly as I had hoped, but it introduced me to the basics of deep Q-learning and NEAT, as well as some fun software engineering to build 2048 from the ground up in Python, playable on the Terminal. The repo can be viewed here.
A Python-Kafka-BERT-Kafka-BigQuery-Trump Project
Originally built for a final project for a Big Data Technologies grad school course, this project takes an extremenly over-engineered approach towards analyzing the best Tweets from the worst president this country has ever had. Essentially, the project accepts text input from a user, encodes it via a pre-trained RoBERTa model, finds similar text encodings of Trump tweets, and returns the most similar tweet to the input. This was my first exposure to the GCP suite of tools as well as some basic deep learning NLP models. A stupid idea that was a lot of fun to work on. The repo can be viewed here.
Another grad school project, this was a simple Flask API linking to a FastAI image classification model to note whether an item was recyclable, what category of recyclable it fell into, and how to properly recycle it. I was able to deploy this on EC2 for the demo and have the entire class hit the endpoint successfully. I worked on the machine learning component while another teammate, Noah Quanrud, worked on the front-end. The repo can be viewed here.
Did Someone Get Robbed at IIT Today?
Using the SODA API via the Chicago Data Portal, I analyzed the most-recent crimes that occurred on the Illinois Institute of Technology campus using data analysis, visualization, and mining skills. The site was a great place for Illinois Tech students to view past crimes locations, times, and statistics to better improve their safety rather than the incosistent and barebones IIT Alert method previously in place. Most importantly, the site answered the most important question of our generation: Did Someone Get Robbed at IIT Today? The repo can be viewed here. The website is still up (now a subdomain of this site), but the API for IIT Alert has since broken, so the site reports no crime in the last two years (which is very, very incorrect). If that sounds okay to you, the site can still be viewed here.
Collaborative Research: Managing Stress in the Workplace: Unobtrusive Monitoring and Adaptive Interventions
Collected, processed, cleaned, and ran the data analysis of experimental physiological data to measure stress through different office workplace tasks, specifically within report writing and report presenting. Analysis included data visualization of stress signals and normalized stress level differences in box plots, natural language processing to determine sentiment and token type of subject responses, hypothesis testing and ANOVA to determine significant differences in physiological measures, and mixed linear models to explain what factors explain changes in stress levels. The full GitHub repo is available here for all script and report viewing. The research poster I made on the project be viewed here. The published paper can be viewed here for free!
Code415E Source Code
My high school CS project. It’s awful in hindsight, but even today as I edit this site, I can’t bring myself to remove it – I love it too much. Anyway, here’s the original description for this project, unedited, that I initially put on the site:
My ultimate pride and glory. This is a text-based, Zork-like game built in Java during my senior year of high school and slowly expanded upon and refined since then. This demonstrates my understanding of Object-oriented Programming and some real fun conditional statements that lead to what the New York times calls a riveting story full of multiple endings, secret paths, cheat codes, and characters that you instantly fall in love with. Okay fine, you caught me – they didn’t really say all of that; just most of it. If anything, this game is just a fun pastime for when you’re bored and hate GUIs. Comes as a.zip file including a platform-specific shell-script, Java installation instructions, and a JAR file armed with the actual code. You can download it here.
Currently Active Websites
In college, I built websites to make money for tuition. If you are looking for a web developer, I shouldn’t be the person you hire (I think I’m better as a data scientist), but since this is the first thing I ever wrote code for, I feel like I can’t remove them from this site quite yet. Maybe I will once each website is dead?
Sadly, this list used to be longer, but many of these websites have since been deactivated / changed. R.I.P. to those sites.