DMA Counties 2016
The purpose of this report is to allow station management to get a county-by-county look at the coverage of our stations and their direct competition.
Data Science Capstone Project: Word Prediction
This was the Capstone project for the Johns Hopkins Data Science specialization on Coursera. It makes use of natural language processing in R, with the intention of predicting the next word a user will type (based on the previous words typed). The result is only modestly successful, although it is at least as good as the word prediction on my smartphone.
Writing sample: This milestone report was produced halfway through the Capstone project, and it is a good example of my technical writing skills. The report was created in R markdown, and runs the R code to produce the graphs and tables when the PDF file is created by the R package “knitr.”
Another part of the project was to create a five-slide presentation to “pitch” our app.
Shiny App: NASCAR Driver Stats
Another project created for the Data Science specialization on Coursera, this Shiny app takes as input the name of a NASCAR driver and a race track. It then displays some statistics about that driver, including some that are specific to the selected track. It also demonstrates web scraping in R, as that is the source of the data.
I intended to use this data along with a machine-learning algorithm to predict the results of NASCAR races. I’ve come to the conclusion that there is too much randomness in the racing results, primarily due to accidents, making them very difficult to predict. My prediction algorithm was able to do about as well as a simple strategy of selecting the fan favorites, but no better.
I should note that the data in the Shiny app has not been updated since the end of the 2015 NASCAR season.
Human Trafficking in NC: Data visualization
I’ve been working on a project dealing with human trafficking in the state of North Carolina. One of the partners in this project gave me an Excel file that contained a report on all of the cases in the North Carolina courts for the year 2014, broken out by offense and county.
I used SAS to reduce this to a subset of human-trafficking related offenses. I also noticed that my initial maps looked a lot like population maps, so I also had SAS convert the number of offenses in each county to a rate per 100,000 population, using 2014 population data from Census.gov.
Then I imported the SAS dataset into Tableau Public, and here is the result: