Writers of the 19th century used orthographic variation as a productive channel of meaning: characters were often marked by their dialect, setting expectations that could be followed or undermined. Authors developed a variety of tropes to signal properties like race, economic status, masculinity, and so forth, but these could easily become detached from reality, or confused by unintended correlations, and owe more to intertextuality than observation of speakers. To explore this mosaic, we are developing a machine learning approach to recognizing and matching orthographic variants with their canonical form, incorporating insights from large language models and metric learning.