Project 1
N-Gram Language Model
As with many statistical models, the true data generating process is unknown to us. However, what we can do is estimate the probabilities of sentences.

This is my attempt to generate human-like sentences by assessing the likelihood of a given sequence of words.
Project 2
IoT Stratified Sampling
Generated a stratified random sample, reducing 4,300 GB of IoT recording to approximately 34.3 GB while ensuring representativeness and efficient data selection.
Project 3
Predictive Outage Cause
Leveraged decision tree model from scikit-learn to classify the cause of severe outages, achieved accuracy score of 0.827225.

Applied GridsearchCV to find the optimal hyperparameter and increased accuracy by 2.4% on validation dataset.
Project 3
CoDebug Python Package
A downloadable python package that replaces error messages with a simple explanation of it, giving students confidence to debug their code.
Project 3
Racoon Spottings
Integrated Google Maps API to display campus location and NoSQL Firebase Cloud to track racoon spottings on UC San Diego campus.