History Beyond: Approaches to Messy Digitized Archival Docs
- Category: Research
- Advisor: Prof. Heather Ruth Lee
- Date: 08/2020 - 09/2020
- Project URL: https://www.historybeyond.com/
- GitHub Link
Description
We implemented Optical Character Recognition using Python OpenCV and Google Tesseract to recognize English words in ancient fonts, digitalizing and preserving historical documents. We also utilized Python Pandas package to wrangle and organize tabular data with 18,000+ entries in Chinese Restaurant Database.