Projects

Side projects, mostly recent.

/01
Who actually defaults?
CatBoost SHAP AI Governance
CatBoost vs. XGBoost vs. LightGBM on UCI Credit Card Default data, optimized for recall on the minority class. SHAP explainability and EU AI Act compliance documentation built in.
Recent payment status dominates. Demographics contribute almost nothing.
/02
Should you call this customer?
Causal Lift A/B Testing Power Analysis
Most marketing analytics measures who converted, not who converted because of the call. Built on the UCI Bank Marketing dataset with covariate adjustment, segmentation, and SHAP explainability.
Prior contact lifts conversion 2.5x; holds after covariate adjustment
/03
How much do ADHD, depression, autism, and bipolar share?
GWAS LDSC Psychiatric Genetics
How much genetic architecture is shared across ADHD, MDD, ASD, and bipolar disorder, and how have those estimates shifted with the 2023 iPSYCH / PGC ADHD GWAS? Personal interest project.
Findings, in progress
/04
Which animals get out of the LA shelter system?
Classification Data Viz Public Data
LA's shelter system is hugely overcrowded. Which intake characteristics most predict whether an animal gets adopted, transferred, or euthanized, and how long they wait for an outcome? Public data, public-facing visualizations.
Findings, in progress
/05
Which rookies are systematically underpaid?
Causal Inference Natural Experiment Sports Analytics
MLB's pre-arbitration service time rules create a natural experiment. Which pre-arb performance signals causally predict post-arb value, and which signals do front offices act on that turn out not to matter?
Findings, in progress
/06
What declines before the headline numbers?
Causal Search PC Algorithm Sports Analytics
Which biomechanical and performance metrics decline first, before the headline output stats start moving, and the market prices it in? Using causal structure learning on Statcast data to find leading indicators aging curves miss.
Findings, in progress
/07
Should you swing on a 2-0 count?
Markov Chains Sabermetrics MLB
A softball coach told me my whole life not to swing on a 2-0 count. Capstone thesis project using Markov Chains on MLB data to test whether he was right. He was. This is also how I ended up in data science.
He was right. The Markov chains agreed. CLU · 2016–2017