- DATA ANALYTICS
- DATA VISUALIZATION
- MACHINE LEARNING
- STATISTICAL ANALYSIS
- DATA MODELING
About
Hi there! I'm Theo, a data science enthusiast with a passion for data analysis and machine learning.
My expertise lies in deciphering complex datasets, engineering machine learning algorithms, and crafting compelling data visualizations that effectively communicate insights to stakeholders.
I am eager to continue learning and growing my skills as a data scientist, and am excited to apply my knowledge and expertise to new challenges and projects.
Skills
- Python
- SQL
- VBA
- HTML/CSS
- Power BI
- Tableau
- Excel
Projects
In this project, I trained a swinv2 network on 800gb of biopsy sample images (dataset) to classify the type of ovarian cancer. The project utilized the Pytorch framework and different image augmentation techniques to avoid overfitting and improve performance as a single image was as large as 9gb.
In this project, I trained a neural network on over 450gb of CT scan images (dataset) to predict bowel injuries. I built the network using the Pytorch framework and then tracked and measured progress using Weights & Biases.
In this project, I trained and deployed a gradient boosted model to predict Airbnb listing prices per night. Using statistcal techniques such as lasso regression and cross validation for feature selection and scoring. Once the model was trained and fine-tuned it was deployed to Streamlit to enable users get an estimate on the value of their own homes.
In this project, I analyse the Instacart dataset to identify patterns in customer purchasing behavior and use these insights to optimize product promotions and recommendations.
In this project, I built a webscrapper using Selenium to scrape over 500 pages on trustpilot.com and create a dataset of over 10,000 revivews.
Sentiment analysis was conducted on the reviews to determine the overall sentiment on the stores services. The results of the analysis were used to identify areas for improvement for the store.
This project is a probability of default model that predicts the likelihood that a borrower will default on a loan, based on historical loan data.
The final output is an easy-to-use scorecard that ranges from 300 to 850. The model was trained using fine and coarse classing with weight of evidence and information value.