Welcome to

My portfolio

Check out some of my work, or reach out to me directly.

  • DATA ANALYTICS
  • DATA VISUALIZATION
  • MACHINE LEARNING
  • STATISTICAL ANALYSIS
  • DATA MODELING

About

Hi there! I'm Theo, a data science enthusiast with a passion for data analysis and machine learning.


My expertise lies in deciphering complex datasets, engineering machine learning algorithms, and crafting compelling data visualizations that effectively communicate insights to stakeholders.


I am eager to continue learning and growing my skills as a data scientist, and am excited to apply my knowledge and expertise to new challenges and projects.

Skills

  • Python
  • SQL
  • VBA
  • HTML/CSS
  • Power BI
  • Tableau
  • Excel

Projects

biopsy sample images and predictions

Ovarian Cancer Subtype

In this project, I trained a swinv2 network on 800gb of biopsy sample images (dataset) to classify the type of ovarian cancer. The project utilized the Pytorch framework and different image augmentation techniques to avoid overfitting and improve performance as a single image was as large as 9gb.

Airbnb streamlit form

Bowel Injury Prediction

In this project, I trained a neural network on over 450gb of CT scan images (dataset) to predict bowel injuries. I built the network using the Pytorch framework and then tracked and measured progress using Weights & Biases.

Airbnb streamlit form

Airbnb Price Prediction

In this project, I trained and deployed a gradient boosted model to predict Airbnb listing prices per night. Using statistcal techniques such as lasso regression and cross validation for feature selection and scoring. Once the model was trained and fine-tuned it was deployed to Streamlit to enable users get an estimate on the value of their own homes.

instacart dashboard

E-Commerce Analysis

In this project, I analyse the Instacart dataset to identify patterns in customer purchasing behavior and use these insights to optimize product promotions and recommendations.

Sentiment analysis on store reviews

Sentiment Analysis

In this project, I built a webscrapper using Selenium to scrape over 500 pages on trustpilot.com and create a dataset of over 10,000 revivews.

Sentiment analysis was conducted on the reviews to determine the overall sentiment on the stores services. The results of the analysis were used to identify areas for improvement for the store.

Loan prediction

Predicting Loan Default

This project is a probability of default model that predicts the likelihood that a borrower will default on a loan, based on historical loan data.

The final output is an easy-to-use scorecard that ranges from 300 to 850. The model was trained using fine and coarse classing with weight of evidence and information value.