Alison Yuhan Yao

aspiring Data Scientist

About Me

About Me

I am current pursuing a MS in Data Science at Harvard University.

  • Name: Yuhan (Alison) Yao
  • Phone: 917-495-4827
  • Current Location: Boston, MA, US
  • Expected graduation date: May 2024
  • Email: yuhan_yao[at]g.harvard[dot]edu
  • Home City: Shanghai, China
  • DS Skills: SQL, Python (Pandas, Matplotlib, Scikit-learn, OpenCV, Pytorch), R (ggplot2)
  • Software Engineering Skills: HTML, CSS, JavaScript (React.js), Python Flask
  • Hobbies & Interests: Guitar, Badminton, Blogging, Travel, Netflix

Hi there! This is Alison. After graduating from NYU with a Bachelor's degree in Data Science, I am now pursuing a Master's degree in Data Science at Harvard University. I aspire to be a Data Scientist who explore the world through the quantitative lens and find connections between everything in life.

I have experience working/researching in the field of Data Science, Computer Vision, and Optimization. I enjoy solving real-life problems with computational methods and find insight in datasets.

I look forward to applying what I've learned to help businesses grow and making a positive impact on the world.
My Resume

My Resume

Please download my resume here. Updated August 2023.


MS in Data Science

09/2022 - 05/2024

Harvard University, Massachusetts, US

GPA: 3.96/4.0

BS in Data Science

09/2018 - 05/2022

New York University New York, US 01/2022 - 05/2022

New York University Abu Dhabi, UAE 09/2021 - 12/2021

New York University Shanghai, China 09/2018 - 05/2021

GPA: 3.94/4.0

Honors: Graduated with Summa Cum Laude, Dean's List, Global Quintessence Scholarship ($39,000 awarded for top 1%)

Research Experience

Research Assistant

09/2023 - Present

Civic Data Project (Advisor: Prof. Liz McKenna & Jae Kim)

  • Creating and cleaning the largest database of civic organization email addresses in the United States

Student Researcher

05/2021 - 09/2022

Optimization of NYU Shanghai Shuttle Bus Schedule (Advisor: Prof. Zhibin Chen)

Shuttle Bus Scheduling Optimization based on Spatio-Temporal Network [publication in progress]

  • Proposed a tailored Genetic Algorithm based on Python to solve a black-box optimization problem and devised an improved shuttle bus schedule, which reduced cost by 6.82% while satisfying students' demand
  • Formulated a real-life vehicle scheduling problem into 2 variations of Spatio-temporal networks and constructed a non-closed form objective function with 3 real-life constraints
  • Led a team of 2 students and 1 shuttle service supervisor and organized bi-weekly meetings with advisor

Research Assistant

08/2020 - 09/2020

History Beyond (Advisor: Prof. Heather Ruth Lee)

  • Implemented Optical Character Recognition using Python OpenCV and Google Tesseract to recognize English words in ancient fonts, digitalizing and preserving historical documents
  • Utilized Python Pandas package to wrangle and organize tabular data with 18,000+ entries from Chinese Restaurant Database
  • Presented research outcome to NYU Shanghai Chancellor, Provost, Dean and Professors

Research Assistant

03/2019 - 12/2019

Open Source Swarm Intelligence Robotics Research (Advisor: Prof. Rodolfo Cossovich)

Work Experience

Data and Applied Scientist Intern

05/2023 - 08/2023

Microsoft Corporation, Redmond, WA, US

  • Developed an automated framework to conduct correlation analysis and causal inference between continuous and categorical variables that can be easily generalized to 65k+ pairs of node health signals and customer-impacting events in Azure cloud system
  • Analyzed correlation relationships of 3420 samples on 57 signal-event pairs using Python to establish 7 statistically significant processes and leveraged proprietary auto causal inference engine to validate 9 causal links on 3 processes, enhancing node health anomaly detection pipeline with customer impact analysis
  • Engineered and cleansed 650+ billion rows of big data using Kusto Query Language by using statistical methods to work around database and hardware limitations while ensuring analysis accuracy, pioneering a big data handling method for colleagues
  • Collaborated across 4 teams in different time zones and presented internship outcomes to CVPs and 140+ full-time employees

Data Science Intern

02/2021 - 05/2021

PayPro Global, Remote

  • Utilized Python to implement K-means and Hierarchical Clustering methods to transform customer recency, frequency, and monetary values, which segmented customers into 3 target groups and engineered 8 features for further prediction
  • Constructed and finetuned a customer lifetime value prediction XGBoost model with 85%+ accuracy, assisting the marketing team to refine customer target strategy
  • Created and designed a Power BI data visualization report featuring 16 plots on webpage template performance, enabling the frontend team to debug hidden template errors and optimize template functionality
  • Presented business insights to CEO and team leader and wrote requested executive summary detailing model mechanism and suggested marketing strategy for senior leadership

Artificial Intelligence Intern

06/2020 - 08/2020

Shanghai Hyron Software Co., LTD, Shanghai, China

  • Preprocessed and extracted structural information from driver's license photos using Python OpenCV and trained a 95%+ accurate CRNN model to recognize numbers, dates, and 7000+ Japanese characters, saving time and manual labor for DMV
  • Implemented an automated pose detector of abnormal behavior for AirPods factory safety check using YOLOv5 and Resnet18 models, preventing theft and larceny
  • Collaborated with 7 other team members and successfully delivered 2 fully-deployed AI products to clients in 3 months

Data Scientist & Software Developer

04/2020 - 01/2022

Coopsight, LLC, Shanghai, China & Abu Dhabi, UAE

  • Managed and maintained data extraction and modification of Firebase NoSQL database, ensuring stable and smooth data connection between frontend and backend
  • Built and tested 6 web pages using React.js and Tailwind.css, with features including embedded Google map, company searching, user matching, enhancing user experience
  • Tested business hypotheses and product-market fit by cold-calling 50+ people and interviewing 20+ potential customers and industry experts to propose data-driven solutions for product iteration

My Portfolio

My Portfolio

Every open-source project that I have done. Updated July 2022.

  • All (19)
  • Project
  • Research
  • Competition

Few-Shot Learning in Computer Vision

Data Science Capstone

Forecasting Time Series Data: Netflix Stock Price Prediction

Forecasting Time Series Data

Optimization of NYU Shanghai Shuttle Bus Schedule


Bechdel Test: Comparing Female Representation Metrics in Movies

Human-centered Data Science

NL2SQL: BERT-based Model for SQL Generation

Natural Language Processing

Violence Against Women: Linear Regression for Causal Inference

Data and Society

Chinese Traffic Sign Recognition Based on Annotated Street View Images

Machine Learning

SAC Bias Reduction: Clipped Double Q vs Multi-Step Method

Reinforcement Learning

Eleme Delivery Analytics & Prediction Under COVID-19

1st Place in Kaggle Competition

Video Streaming Platform: Which Has Better Shows

Regression and Multivariate Data Analysis

Open Source Swarm Intelligence Robotics Research


Online Air Ticket Reservation System


History Beyond


A Comparison of Data Scientist Roles: Job Responsibilities and Business Impact in Enterprises vs. Startups

Experiential Learning Seminar