I have deep industry experience in business, software, life sciences, bioinformatics, financial markets, and genetics. Current technical knowledge/skills includes statistical analysis, machine learning (supervised and unsupervised), deep learning, SQL, Tableau, and Big Data (Hadoop, Spark, AWS, Cassandra, Redshift). For examples of my projects please look at the menu on the left. The most recent projects are under “RECENT POSTS”. For a complete list of projects click on “Technical” which is below on the left under “CATEGORIES”.
Month: May 2018
S&P 500 Trading Driven from Social Media
I am downloading data from Google Trends and merging it with historical S&P 500 index data looking for tradeable trends from April 2013 to April 2018. Correlation is promising (.58) and p-value was excellent at 4.5538 e-24 (VERY strong signal). I am working on creating a goal seeking algorithm that will yield specific signals to buy, sell or hold on a weekly basis (ergo sell on Friday, buy on Monday).
I have not provided html versions of the Jupyter Notebook here that I typically use. This is a work in progress AND … if it does work, then it is obviously something that one would be very reluctant to publish. If you want to talk to me about this, please contact me. I can certainly provide you more information. However, I may be a little circumspect:)
A/B Testing
For the Jupyter Notebook that runs this project (HTML format) click this link:
A/B test on an e-commerce website was completed. Python, Pandas, NumPy, SciPy, Jupyter Notebooks, and Anaconda were utilized to conduct categorical logistic regression analysis and simulations of the data generated by the company’s website.
Additional slicing of the demographic data was also performed to determine if there were any significant behaviors in subsets of the user population. If this interests you please contact me.
This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).
Descriptive Statistics (European Football Association)
For the Jupyter Notebook (HTML format) click this link
Wrangled a SQL database and correlated team attributes to relative rank of European Football clubs from 2011 – 2015. Pandas were used for data manipulation. Plotting of results was accomplished via Matplotlib/seaborn utilizing scatter, histogram, and bar plots. Spearman and Pearson functions confirmed hypothesis that were formed from the visualizations.
If this interests you, please contact me.
This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).
Python (Bikeshare Company)
A bike share provider in the United States wanted to uncover bike share usage patterns. Data (> 10 million rows) from Chicago, New York City, and Washington, DC were analyzed and appropriate descriptive statistics computed utilizing Python , ATOM (IDE), and JSON data formats. If you would like a copy of the .py file please contact me.
This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).