OCR and topic modeling of musical scores to explore the development of minstrelsy through the Nineteenth Century

Johns Hopkins holds the Lester S. Levy collection of 19th and early 20th-century popular sheet music: it offers a unique opportunity to study the development of minstrelsy through the pre- and post-Civil War eras, but poses unique challenges for automatic transcription. We are running initial transcription experiments using open-source OCR models, followed by temporal topic modeling, to attempt to characterize the shifting emphasis across time and between genres. A likely next step will involve gathering a small amount of expert annotation to fine-tune the OCR process and perhaps expand it to include the musical notation itself.

Researchers