Welcome to fuzzy-search’s documentation!

The fuzzy-search package is a Python library for searching keywords and phrases in digitized (OCR/HTR) historic documents that contain historic language use, spelling variation and text recognition errors.

The library allows you to create simples lists of keywords and phrases and use fuzzy search to find approximate matches in texts of any length. It has been developed for use cases where there are hundreds of thousands or millions of text documents, typically of some digitized archival or library collection, where the texts contain many repetitive elements, but historical spelling variation and OCR/HTR errors make it difficult to find these keywords or phrases.

Indices and tables