History Beyond: Approaches to Messy Digitized Archival Docs

Description

We implemented Optical Character Recognition using Python OpenCV and Google Tesseract to recognize English words in ancient fonts, digitalizing and preserving historical documents. We also utilized Python Pandas package to wrangle and organize tabular data with 18,000+ entries in Chinese Restaurant Database.