Panda: Panda is a weakly-supervised system for entity matching. Panda uses the labeling function abstraction, where labeling functions (LFs) are user-provided programs that can cheaply generate many noisy match/unmatch labels. These LFs can then be combined by a labeling model to make accurate final predictions. I am currently working on improving the preliminary Panda system to combine LFs more intelligently and scale better on massive datasets.
More coming soon!
I-Rex: I-Rex, an Interactive Relational Query Explainer for SQL, is a tool meant to help students learning SQL understand their mistakes while debugging incorrect queries. Check out the paper or watch the VLDB demo video (starring yours truly). I-Rex has now been utilized by students in the introductory database course at Duke for several semesters. This work is part of a broader project funded by NSF called “HNRQ: Helping Novices Learn and Debug Relational Queries”.
NeuroFit: This is an interdisciplinary research study in the MCAB Lab at Duke. Participants in this study have their physical activity monitored for three months while receiving physical activity promotion messages on their phones in the form of text messages. I analyzed partial data results to investigate the effectiveness of these text messages at motivating participants to be more physically active. You can read my report about the results here. Feel free to read more about the study and check out some of the code on GitHub.
Basketball Data Analytics: As part of Duke’s 2019 Data+ Program, I spent 10 weeks extracting basketball player motion data from basketball footage found on YouTube using a neural network, as well as training machine learning models to distinguish referees from players and group the detected players onto teams. If you’re curious, you can explore our code, poster, and slides from our end-of-summer presentation.