Portfolio

AI for Cities with DataKind Volunteers

Datakind + Microsoft Virtual DataDive: Applying AI to Societal Challenges in US Cities | Miami Project
- Background: Code for Miami and the city of Miami aim to increase affordable housing by making the process of applying and building these units more transparent and user-friendly. Code for Miami developed gethousing.org to map existing units and eventually allow developers to track applications. The goal of our team was to show how AI can improve the interaction that developers have with the application process on the site, ultimately increasing the motivation to build more affordable housing.
- What we did: In a team of 22, from all over the globe, we spent 9 hours developing a model that would be able to predict how long a permit would be in review (ultimately allowing developers to understand and plan for the review process better). We also prototyped an API that could be used on the gethousing.org website to show users their estimated time in review. The event culminated in a presentation to the rest of the DataDive teams, sharing our project and findings.
- How we did it: As part of the team, I used ggplot and tidyverse to visualize potential variable relationships in R, then used pandas and scikit-learn in Python to build/test machine learning models (including linear regression, ridge regression, and lasso regression). After identifying several key variables and testing various machine learning algorithms, we were able to generate a gradient boosted decision tree model with an r-squared of 0.62 for use in an API.

Analyzing the Comments on a Web-Based Health Comic

Exploring the comments section for a health comic developed by the NYSDOH AIDS Institute
- Background: The New York State Department of Health’s AIDS Institute created a web-based health comic to educate young readers about HIV/AIDS, STIs, drug user health, and LGBTQ+ health. The web forum that the comic was hosted on allowed readers to interact with the series, either through commenting, liking, or following. I wanted to collect and explore that reader interaction data to see if we could discover insights into what content resonated with readers, what the major topics in the discussions between readers were, and how the comic performed over time.
- What we did: Using R, I learned how to write functions that could scrape web data from the comic forum (you can read through my process here). I then did exploratory data analysis of the quantitative metrics in the data (i.e. the number of likes per episode, the number of comments per episode, the average frequency of comments per user, etc.). I also began a qualitative analysis of the comment text data, iteratively building a codebook to identify key themes among the comments.
- How we did it: I produced several reports of the quantitative data analysis, using tidyverse and ggplot in R to pull out visualizations and aggregations of the data after every updated data scrape (example report linked above). I also explored basic sentiment analysis using TidyText. These reports were shared with the leadership team for the Digital Health Initiatives at the AIDS Institute to help guide the creation of further plot arcs.

Using Plot.ly to Visualize HIV/AIDS Trends in NYC

Identification of trends and disparities in HIV/AIDS for NYC
- Final project as part of my data science course in my graduate program.
- Background: Motivated by Governor Cuomo’s initiative for ending the AIDS epidemic in New York State by 2020 (ETE 2020), we aimed to identify key vulnerable populations in New York City (NYC) disproportionately burdened by HIV/AIDS in order to improve interventions implemented under this initiative.
- What we did: To that end, we utilized annual HIV/AIDS reports from NYC through the NYC Open Data database, as well as supplementary NYC geographic information. Our analyses focus on exploring potential health disparities across geographic and sociodemographic variables, as well as understanding the current trends for HIV/AIDS across time and in relation to comorbidities. We then wrote a report to discuss the current state and effectiveness of HIV/AIDS interventions in NYC.
- How we did it: Using the tidyverse and plot.ly, I wrote the “Big Picture” page of the report. By exploring HIV death rates over time and across boroughs, I drew interpretations from the visualizations and connected the findings to next steps for the ETE 2020 initiative.

Using Dash to create an interactive, educational website for clinicians to learn about AI

Competing in the Mount Sinai Health Hackathon 2019
- Background: The Mount Sinai Health Hackathon brings together interdisciplinary teams to discover novel technology-based solutions for healthcare. The theme for 2019 was Artificial Intelligence - Expanding the Limits of Human Performance.
- What we did: Team Blackbox AI was focused on addressing the fundamental discrepancy between AI solutions designed for healthcare and the way in which clinicians and healthcare practitioners actually understand and use AI tools. We saw that there was a high barrier to access when it came to learning how to interpret AI, as well as general distrust in what AI could do for clinical work. Our solution was to design and create a website that uses video lessons, interactive statistical visualizations, and written descriptions to educate clinicians on the application and interpretation of artificial intelligence in healthcare settings.
- How we did it: I brainstored project goals and potential execution plans, coming up with content ideas and marketing perspectives. I also took point on learning how to use Dash to create a functioning mock-up of the website (domain no longer active). I incorporated the lesson plans and educational material created by the group into several interactive webpages that could be easily navigated and visually appealing.

Researching + Writing a Logistical Framework for Sustainable Commerce

Open Climate Collabathon 2019
- Background: I was a member of the Consumer Disclosure team (winner of the Most Innovative Contribution award), who set out to create a “Sustainability Score” that would assess all the entire supply chain of a product and weigh various factors to produce an intuitive score for a consumer item Consumers could then use this score to gauge how the environmental impact of a product.
- What we did: I worked with a cross-disciplinary group (all remoting in from around the world) and eventually reconnected with several team members to help launch/guide the next iteration of consumer-focused projects in the 2020 Collabathon. I was a key contributer to the research, design, and write-up of a theoretical framework that could support a “Sustainability Score” for online marketplaces.
- How we did it: The scope and output of this project are documented on the Collabathon’s gitbook (linked above).

Exploring Restaurant Health Codes in NYC with Flexdashboard

Background: To build my knowledge of visualizations in R, including how to integrate flexdashboard into websites, I decided to investigate where is safe to eat while in NYC.
What we did: Using open data provided by the NYC government, I created a dashboard created using flexdashboard in R to let you know how restaurants in NYC are keeping up with health regulations.
How we did it: Eat here not there