I have deep industry experience in business, software, life sciences, bioinformatics, financial markets, and genetics. Current technical knowledge/skills includes statistical analysis, machine learning (supervised and unsupervised), deep learning, SQL, Tableau, and Big Data (Hadoop, Spark, AWS, Cassandra, Redshift). For examples of my projects please look at the menu on the left. The most recent projects are under “RECENT POSTS”. For a complete list of projects click on “Technical” which is below on the left under “CATEGORIES”.
Author: LindsayMoir
Time To Go Out?
It can be difficult to get accurate and time sensitive information on how countries are doing at present during the SARS-2 pandemic (COVID-19 is the illness, the pandemic is the virus). Exploring factors potentially related to outcomes is important as a means to consider further outcome prediction. Key factors are likely related to government and public health leadership and the potential for actual public adoption of mitigation strategies for the virus. In this post, I answer 4 questions that should provide some insight into this very important issue.
What I Did
I took the John Hopkins Covid-19 data, which is the defacto centralized source for Covid-19 data globally. I then combined that with data on Median Per Capita Income and Population from World Population Review. Finally I added a column of data from the Transparency International dataset that provides an honesty score for each country. For the results of this analysis please go to this Medium story.
There are 2 repositories that are associated with this on GitHub. The first one titled covid, has all of the code for this analysis. The second one, titled arima, has a reusable ARIMA process that can be utilized with the John Hopkins data or could easily be repurposed for another univariate time series that you want to forecast outcomes.
Moon Shot Genius
I am currently working on a recommendation engine (Moon Shot Genius (MSG)) for buy/sell on lower priced stocks that are likely targets of insider training. The inputs are various social media feeds and stock market data.
I am using Python, Pandas, SQLite, Tableau, and various supervised machine learning and deep learning algorithms.
This technology hooks into a trading engine that automatically provides stock data (price, volume, etc.), debates with MSG and then produces a buy, sell, or do nothing signal. A text message is then sent to the user with the appropriate message.
Prosper, P2P Lending Marketplace, Tableau
Savers using Commercial Deposits (CDs) (the most widely used savings vehicle in the USA) get terrible returns. Prosper returns are up to 20 times larger. This Tableau Story shows that, if you use Prosper you need to manage your losses. If you do, you achieve a +8% return over and above a 3-year CD return. Prosper is like the Chinese symbols for Crisis: Danger and Opportunity. There is opportunity but … you could easily lose your investment capital if you do not pay attention to managing loan losses.
“Prosper Marketplace, Inc. is a San Francisco, California-based company in the peer-to-peer lending industry. Prosper Funding LLC, one of its subsidiaries, operates Prosper.com, a website where individuals can either invest in personal loans or request to borrow money.” Source Wikipedia. To see the Tableau Story click on the link below.
Prosper P2P Lending Marketplace
Technologies utilized are Tableau, Python, and Pandas. If this interests you please contact me. This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).
S&P 500 Trading Driven from Social Media
I am downloading data from Google Trends and merging it with historical S&P 500 index data looking for tradeable trends from April 2013 to April 2018. Correlation is promising (.58) and p-value was excellent at 4.5538 e-24 (VERY strong signal). I am working on creating a goal seeking algorithm that will yield specific signals to buy, sell or hold on a weekly basis (ergo sell on Friday, buy on Monday).
I have not provided html versions of the Jupyter Notebook here that I typically use. This is a work in progress AND … if it does work, then it is obviously something that one would be very reluctant to publish. If you want to talk to me about this, please contact me. I can certainly provide you more information. However, I may be a little circumspect:)
Twitter Data Wrangling
For the Jupyter Notebook that runs this project (HTML format) click this link:
Technologies utilized are Python, Pandas, and a variety of plotting packages. The target was @dog_rates. Tweepy (Python library) was used to access the twitter API and receive JSON data. I also had access to a machine learning file that classified pictures of dogs (.tsv), which was downloaded via Python’s request library. This project includes data gathering, cleaning, storing, and analyzing the results. I was most interested in what this population of users likes in terms of breeds and how their ratings of those breeds have changed over time.
If this interests you please contact me. This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).
Red Wine (RStudio and ggplot)
For the RMD file that shows this project (HTML format) click this link:
RStudio Analysis of Red Wine Dataset
The technologies used were R, RStudio and ggplot. I explored a “tidy data set that contains 1,599 red wines with 11 variables on the chemical properties of that wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent). A linear model was produced which “reliably” predicted the quality of the wine. The model was somewhat like a friend of mine, they have both never met a bad bottle?
If you want to talk to me about this, please contact me. This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).
Stroop Effect, Descriptive Statistics
For the Jupyter Notebook that runs this project (HTML format) click this link:
Technologies utilized are Python, Pandas, Matplotlib, and SciPy (statistics).
To make our lives simple, our brains are wired to simply respond to stimuli. We have learned over the years that most stimuli are congruent. When the wind blows the trees move. As a result, it is not necessary to measure the velocity of the wind. You can simply look out the window of the house and if the trees are moving, it is windy.
When we encounter stimuli that is incongruent, all this pre-set up wiring needs neutralizing for us to properly respond. It takes some time to process the stimuli and respond correctly since we need to fight thru all the pre-conditioning that we have developed over a life time. This difficulty is called the Stroop Effect.
The Stroop Effect is all around us. An example would be a police lineup. Supposedly the perp is in the line up. It is congruent that the perp is there. So, the witness picks one of the people in the lineup. Yet, all that may be occurring is the Stroop Effect. While this may sound academic, for the person that just got picked out of the lineup for a major crime, it is far from academic.
The net is for a wide range of stimuli, there is built in congruency bias. This may be helpful for us to run our lives, but … when true thinking is required, it is a large hurdle for us to a) recognize that is what is occurring and b) continue with the process to think rationally instead of reflexively.
If this interests you please contact me. This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).
A/B Testing
For the Jupyter Notebook that runs this project (HTML format) click this link:
A/B test on an e-commerce website was completed. Python, Pandas, NumPy, SciPy, Jupyter Notebooks, and Anaconda were utilized to conduct categorical logistic regression analysis and simulations of the data generated by the company’s website.
Additional slicing of the demographic data was also performed to determine if there were any significant behaviors in subsets of the user population. If this interests you please contact me.
This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).
Descriptive Statistics (European Football Association)
For the Jupyter Notebook (HTML format) click this link
Wrangled a SQL database and correlated team attributes to relative rank of European Football clubs from 2011 – 2015. Pandas were used for data manipulation. Plotting of results was accomplished via Matplotlib/seaborn utilizing scatter, histogram, and bar plots. Spearman and Pearson functions confirmed hypothesis that were formed from the visualizations.
If this interests you, please contact me.
This project was done as part of my course work at Udacity (www.udacity.com) for the Data Analyst Nano Degree (DAND).