about 👋

Hi, I'm Gabriel. I'm a Data Science undergraduate at UC San Diego, driven by a passion for discovery, creation, and pushing the boundaries of data science.

Currently, I'm researching at the DSTL Lab under Assistant Teaching Professor Sam Lau, building tools to improve data science education. My project, ContentGen, is a JupyterLab extension that helps instructors streamline lecture content creation, including a chatbot that generates questions from notebooks for more interactive teaching.

Previously, I interned at Edison, where I worked on an anomaly detection system to identify nighttime solar energy discrepancies in over 1 million rows of data. Before that, I was an Instructional Assistant for multiple UCSD data science courses, including my favorite, The Practice and Application of Data Science. Over 9 quarters, I taught 1,200+ students algorithms, statistics, and machine learning, consistently earning 100% positive student evaluations.

Earlier, I worked with Engineers For Exploration, a research-focused club, where I improved bird species classification accuracy by 2% using advanced feature engineering and data optimization techniques.

I'm deeply passionate about data and excited to leave a meaningful impact in the world of data science.

LinkedIn Resume Site last updated: December 2024


projects

Project 1
CBM-GUI
Creating a user interface for the Concept Bottleneck Large Language Model (CB-LLM), an interpretable LLM introduced by Lily Weng at ICML 2024 MI Workshop. CB-LLM integrates high accuracy, transparency, and scalability for enhanced interpretability.
Project 2
Speech Emotion Recognition
Recipient of the HDSI scholarship ($6500) for researching emotion classification in voice recordings. Experimented with and selected the best feature extraction methods, transforming recordings into matrices using Fourier analysis. Performed data augmentations and optimized four models: Decision Tree, SVM, ViT, and CNN.
Project 3
N-Gram Language Model
Built a Language N-Gram model that estimates the probability of a word based on preceding words, using empirical frequencies. The goal is to uncover language patterns and enable statistical text generation.
Project 5
CoDebug Python Package
Published a Python library to improve the debugging experience for developers in Jupyter Notebook by using linecache to read error messages and integrating with the OpenAI API to provide troubleshooting suggestions.
Project 4
Predictive Outage Cause
Leveraged sklearn's decision tree to classify severe power grid outage causes, normalizing data and optimizing hyperparameters, achieving 82.7% accuracy.
Project 2
ML Modeling in Spark
Wrangled 25 GB of data by performing joins and aggregations to train a Word2Vec model in Apache Spark, simultaneously learning the fundamentals of systems for scalable analytics.
Project 6
IoT Stratified Sampling
Generated a stratified random sample, reducing 4,300 GB of IoT recording to approximately 34.3 GB while ensuring representativeness and efficient data selection.
Project 7
Racoon Spottings
Integrated Google Maps API to display campus location and NoSQL Firebase Cloud to track racoon spottings on UC San Diego campus.


Since you made it this far ✌️ let's connect! My email is