About Me
I am current pursuing a MS in Data Science at Harvard University.
- Name: Yuhan (Alison) Yao
- Phone: 917-495-4827
- Current Location: Boston, MA, US
- Expected graduation date: May 2024
- Email: yuhan_yao[at]g.harvard[dot]edu
- Home City: Shanghai, China
- DS Skills: SQL, Python (Pandas, Matplotlib, Scikit-learn, OpenCV, Pytorch), R (ggplot2)
- Software Engineering Skills: HTML, CSS, JavaScript (React.js), Python Flask
- Hobbies & Interests: Guitar, Badminton, Blogging, Travel, Netflix
Hi there! This is Alison. After graduating from NYU with a Bachelor's degree in Data Science, I am now pursuing a Master's degree in Data Science at Harvard University. I aspire to be a Data Scientist who explore the world through the quantitative lens and find connections between everything in life.
I have experience working/researching in the field of Data Science, Computer Vision, and Optimization. I enjoy solving real-life problems with computational methods and find insight in datasets.
I look forward to applying what I've learned to help businesses grow and making a positive impact on the world.
Education
MS in Data Science
09/2022 - 05/2024
Harvard University, Massachusetts, US
GPA: 3.96/4.0
BS in Data Science
09/2018 - 05/2022
New York University New York, US 01/2022 - 05/2022
New York University Abu Dhabi, UAE 09/2021 - 12/2021
New York University Shanghai, China 09/2018 - 05/2021
GPA: 3.94/4.0
Honors: Graduated with Summa Cum Laude, Dean's List, Global Quintessence Scholarship ($39,000 awarded for top 1%)
Research Experience
Research Assistant
09/2023 - Present
Civic Data Project (Advisor: Prof. Liz McKenna & Jae Kim)
- Creating and cleaning the largest database of civic organization email addresses in the United States
Student Researcher
05/2021 - 09/2022
Optimization of NYU Shanghai Shuttle Bus Schedule (Advisor: Prof. Zhibin Chen)
Shuttle Bus Scheduling Optimization based on Spatio-Temporal Network [publication in progress]
- Proposed a tailored Genetic Algorithm based on Python to solve a black-box optimization problem and devised an improved shuttle bus schedule, which reduced cost by 6.82% while satisfying students' demand
- Formulated a real-life vehicle scheduling problem into 2 variations of Spatio-temporal networks and constructed a non-closed form objective function with 3 real-life constraints
- Led a team of 2 students and 1 shuttle service supervisor and organized bi-weekly meetings with advisor
Research Assistant
08/2020 - 09/2020
History Beyond (Advisor: Prof. Heather Ruth Lee)
- Implemented Optical Character Recognition using Python OpenCV and Google Tesseract to recognize English words in ancient fonts, digitalizing and preserving historical documents
- Utilized Python Pandas package to wrangle and organize tabular data with 18,000+ entries from Chinese Restaurant Database
- Presented research outcome to NYU Shanghai Chancellor, Provost, Dean and Professors
Research Assistant
03/2019 - 12/2019
Open Source Swarm Intelligence Robotics Research (Advisor: Prof. Rodolfo Cossovich)
- Designed, 3D-printed, and tested infrared positioning device of robot “Swarmesh” and built scalable and decentralized swarm intelligent robots from scratch
- Compared Swarmesh with Kilobot, Jasmine and R-one to identify the limitations of existing Swarm robot systems
- Published and presented paper Framework for Present Swarm Robotic Systems and New Implementations to Increase Scalability at SWARM 2019 robotics conference in Japan
Work Experience
Data and Applied Scientist Intern
05/2023 - 08/2023
Microsoft Corporation, Redmond, WA, US
- Developed an automated framework to conduct correlation analysis and causal inference between continuous and categorical variables that can be easily generalized to 65k+ pairs of node health signals and customer-impacting events in Azure cloud system
- Analyzed correlation relationships of 3420 samples on 57 signal-event pairs using Python to establish 7 statistically significant processes and leveraged proprietary auto causal inference engine to validate 9 causal links on 3 processes, enhancing node health anomaly detection pipeline with customer impact analysis
- Engineered and cleansed 650+ billion rows of big data using Kusto Query Language by using statistical methods to work around database and hardware limitations while ensuring analysis accuracy, pioneering a big data handling method for colleagues
- Collaborated across 4 teams in different time zones and presented internship outcomes to CVPs and 140+ full-time employees
Data Science Intern
02/2021 - 05/2021
PayPro Global, Remote
- Utilized Python to implement K-means and Hierarchical Clustering methods to transform customer recency, frequency, and monetary values, which segmented customers into 3 target groups and engineered 8 features for further prediction
- Constructed and finetuned a customer lifetime value prediction XGBoost model with 85%+ accuracy, assisting the marketing team to refine customer target strategy
- Created and designed a Power BI data visualization report featuring 16 plots on webpage template performance, enabling the frontend team to debug hidden template errors and optimize template functionality
- Presented business insights to CEO and team leader and wrote requested executive summary detailing model mechanism and suggested marketing strategy for senior leadership
Artificial Intelligence Intern
06/2020 - 08/2020
Shanghai Hyron Software Co., LTD, Shanghai, China
- Preprocessed and extracted structural information from driver's license photos using Python OpenCV and trained a 95%+ accurate CRNN model to recognize numbers, dates, and 7000+ Japanese characters, saving time and manual labor for DMV
- Implemented an automated pose detector of abnormal behavior for AirPods factory safety check using YOLOv5 and Resnet18 models, preventing theft and larceny
- Collaborated with 7 other team members and successfully delivered 2 fully-deployed AI products to clients in 3 months
Data Scientist & Software Developer
04/2020 - 01/2022
Coopsight, LLC, Shanghai, China & Abu Dhabi, UAE
- Managed and maintained data extraction and modification of Firebase NoSQL database, ensuring stable and smooth data connection between frontend and backend
- Built and tested 6 web pages using React.js and Tailwind.css, with features including embedded Google map, company searching, user matching, enhancing user experience
- Tested business hypotheses and product-market fit by cold-calling 50+ people and interviewing 20+ potential customers and industry experts to propose data-driven solutions for product iteration
My Portfolio
Every open-source project that I have done. Updated July 2022.
- All (19)
- Project
- Research
- Competition