fuzzy_search package

Submodules

fuzzy_search.fuzzy_config module

fuzzy_search.fuzzy_context_searcher module

class fuzzy_search.fuzzy_context_searcher.FuzzyContextSearcher(config: Optional[dict] = None)

Bases: fuzzy_search.fuzzy_phrase_searcher.FuzzyPhraseSearcher

add_match_context(match: fuzzy_search.fuzzy_match.PhraseMatch, text: Union[str, dict], context_size: Union[None, int] = None, prefix_size: Union[None, int] = None, suffix_size: Union[None, int] = None) fuzzy_search.fuzzy_match.PhraseMatchInContext

Add context to a given match and its corresponding text document.

Parameters
  • match (PhraseMatch) – a phrase match object

  • text (Union[str, dict]) – the text that the match was taken from

  • context_size (int) – the size of the pre- and suffix window

  • prefix_size (Union[None, int]) – size of the prefix context

  • suffix_size (Union[None, int]) – size of the suffix context

Returns

the phrase match object with context

Return type

PhraseMatchInContext

configure_context(config: dict) None

Configure the context searcher.

Parameters

config (dict) – a dictionary with configuration parameters to override the defaults

find_matches(text: Union[str, dict], use_word_boundaries: Union[None, bool] = None, allow_overlapping_matches: bool = True, include_variants: Optional[bool] = None, filter_distractors: Optional[bool] = None, prefix_size: Union[None, int] = None, suffix_size: Union[None, int] = None, skip_exact_matching: Optional[bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatchInContext]

Find fuzzy matches for registered phrases and add context around match string. This extends the find_matches function of the FuzzyPhraseSearcher by adding local context to each match.

Parameters
  • text (Union[str, Dict[str, str]]) – the text (string or dictionary with ‘text’ property) to find fuzzy matching phrases in.

  • use_word_boundaries (bool) – use word boundaries in determining match boundaries

  • allow_overlapping_matches (bool) – boolean flag for whether to allow matches to overlap in their text ranges

  • include_variants (bool) – boolean flag for whether to include phrase variants for finding matches

  • filter_distractors (bool) – boolean flag for whether to remove phrase matches that better match distractors

  • prefix_size (Union[None, int]) – the size of the prefix context window

  • suffix_size (Union[None, int]) – the size of the suffix context window

  • skip_exact_matching (Union[None, bool]) – boolean flag whether to skip the exact matching step

Returns

a list of phrases matches with text surrounding the match string

Return type

PhraseMatchInContext

find_matches_in_context(match_in_context: fuzzy_search.fuzzy_match.PhraseMatchInContext, use_word_boundaries: Union[None, bool] = None, include_variants: Union[None, bool] = None, filter_distractors: Union[None, bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatch]

Use a MatchInContext object to find other phrases in the context of that match.

Parameters
  • match_in_context (PhraseMatchInContext) – a match phrase with context from the text that the match was taken from

  • use_word_boundaries (bool) – boolean whether to adjust match strings to word boundaries

  • include_variants (bool) – boolean whether to include variants of phrases in matching

  • filter_distractors (bool) – boolean whether to remove matches that are closer to distractors

Returns

a list of match objects

Return type

List[PhraseMatch]

fuzzy_search.fuzzy_match module

class fuzzy_search.fuzzy_match.Candidate(phrase: fuzzy_search.fuzzy_phrase.Phrase, max_length_variance: int = 1, ignorecase: bool = False)

Bases: object

add_skip_match(skipgram: fuzzy_search.fuzzy_string.SkipGram) None

Add a skipgram match between a text and a phrase ot the candidate.

Parameters

skipgram (SkipGram) – a matching skipgram

get_match_start_offset() Union[None, int]

Calculate the start offset of the match.

Returns

the start offset of the match

Return type

int

get_match_string(text: Dict[str, any]) Optional[str]

Find the matching string of a candidate fuzzy match between a text and a phrase.

Parameters

text (Dict[str, any]) – the text object from which the candidate was derived

Returns

the matching string

Return type

str

get_skip_count_overlap() float

Calculate deviation of candidate skipgrams from phrase skipgrams.

Returns

the skipgram overlap (-inf, 1.0]

Return type

float

get_skip_set_overlap() float

Calculate and set skipgram overlap between text and phrase skipgram matches.

Returns

the skipgram overlap

Return type

float

is_match(skipgram_threshold: float)

Check if the candidate is a likely match for its corresponding phrase.

Parameters

skipgram_threshold (float) – the threshold to for how many skipgrams have to match between candidate and phrase

Returns

a boolean whether this candidate is a likely match for the phrase

Return type

bool

remove_first_skip() None

Remove the first matching skipgram from the list and update the count and set.

same_candidate(other: fuzzy_search.fuzzy_match.Candidate)

Check if this candidate has the same start and end offsets as another candidate.

Parameters

other (Candidate) – another candidate for the same phrase and text.

Returns

this candidate match has the same offsets as the other candidate

Return type

bool

shift_start_skip() bool

Check if there is a later skip that is a better start.

skip_match_length() int

Return the length of the matching string.

Returns

difference between start and end offset

Return type

int

class fuzzy_search.fuzzy_match.PhraseMatch(match_phrase: fuzzy_search.fuzzy_phrase.Phrase, match_variant: fuzzy_search.fuzzy_phrase.Phrase, match_string: str, match_offset: int, ignorecase: bool = False, text_id: Union[None, str] = None, match_scores: Optional[dict] = None, match_label: Optional[Union[str, List[str]]] = None, match_id: Optional[str] = None)

Bases: object

add_scores(skipgram_overlap: Union[None, float] = None) None

Compute overlap and similarity scores between the match variant and the match string and add these to the match object.

Parameters

skipgram_overlap (Union[float, None]) – the overlap in skipgrams between match string and match variant

Returns

None

Return type

None

as_web_anno() Dict[str, any]

Turn match object into a W3C Web Annotation representation

has_label(label: str)
json() dict
property label_list: List[str]
overlaps(other: fuzzy_search.fuzzy_match.PhraseMatch) bool

Check if the match string of this match object overlaps with the match string of another match object.

Parameters

other (PhraseMatch) – another match object

Returns

a boolean indicating whether the match_strings of the two objects overlap in the source text

Return type

bool

score_character_overlap()

Return the character overlap between the variant phrase_string and the match_string

Returns

the character overlap as proportion of the variant phrase string

Return type

float

score_levenshtein_similarity()

Return the levenshtein similarity between the variant phrase_string and the match_string

Returns

the levenshtein similarity as proportion of the variant phrase string

Return type

float

score_ngram_overlap() float

Return the ngram overlap between the variant phrase_string and the match_string

Returns

the ngram overlap as proportion of the variant phrase string

Return type

float

class fuzzy_search.fuzzy_match.PhraseMatchInContext(match: fuzzy_search.fuzzy_match.PhraseMatch, text: Optional[Union[str, dict]] = None, context: Optional[str] = None, context_start: Optional[int] = None, context_end: Optional[int] = None, prefix_size: int = 20, suffix_size: int = 20)

Bases: fuzzy_search.fuzzy_match.PhraseMatch

as_web_anno() Dict[str, any]

Turn match object into a W3C Web Annotation representation

json()
fuzzy_search.fuzzy_match.adjust_match_end_offset(phrase_string: str, candidate_string: str, text: Dict[str, any], end_offset: int, punctuation: str) Optional[int]

Adjust the end offset if it is not at a word boundary.

Parameters
  • phrase_string (str) – the phrase string

  • candidate_string (str) – the candidate match string

  • text (Dict[str, any]) – the text object that contains the candidate match string

  • end_offset (int) – the text offset of the candidate match string

  • punctuation (str) – the set of characters to treat as punctuation

Returns

the adjusted offset or None if the required adjustment is too big

Return type

Union[int, None]

fuzzy_search.fuzzy_match.adjust_match_offsets(phrase_string: str, candidate_string: str, text: Dict[str, any], candidate_start_offset: int, candidate_end_offset: int, punctuation: str = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~') Optional[Dict[str, Union[str, int]]]

Adjust the end offset if it is not at a word boundary.

Parameters
  • phrase_string (str) – the phrase string

  • candidate_string (str) – the candidate match string

  • text (Dict[str, any]) – the text object that contains the candidate match string

  • candidate_start_offset (int) – the text offset of the start of the candidate match string

  • candidate_end_offset (int) – the text offset of the end of the candidate match string

  • punctuation (str) – the set of characters to treat as punctuation (defaults to string.punctuation)

Returns

the adjusted offset or None if the required adjustment is too big

Return type

Union[int, None]

fuzzy_search.fuzzy_match.adjust_match_start_offset(text: Dict[str, any], match_string: str, match_offset: int) Optional[int]

Adjust the start offset if it is not at a word boundary.

Parameters
  • text (Dict[str, any]) – the text object that contains the candidate match string

  • match_string (str) – the candidate match string

  • match_offset (int) – the text offset of the candidate match string

Returns

the adjusted offset or None if the required adjustment is too big

Return type

Union[int, None]

fuzzy_search.fuzzy_match.calculate_end_shift(phrase_end: str, match_end: str, text_suffix: str, end_offset: int)
fuzzy_search.fuzzy_match.map_string(affix_string: str, punctuation: str, whitespace_only: bool = False) str

Turn affix string into type char representation. Types are ‘w’ for non-whitespace char, and ‘s’ for whitespace char.

Parameters
  • affix_string – a string

  • punctuation (str) – the set of characters to treat as punctuation

  • whitespace_only (bool) – whether to treat only whitespace as word boundary or also include (some) punctuation

Type

str

Returns

the type char representation

Return type

str

fuzzy_search.fuzzy_match.phrase_match_from_json(match_json: dict) fuzzy_search.fuzzy_match.PhraseMatch
fuzzy_search.fuzzy_match.validate_match_props(match_phrase: fuzzy_search.fuzzy_phrase.Phrase, match_variant: fuzzy_search.fuzzy_phrase.Phrase, match_string: str, match_offset: int) None

Validate match properties.

Parameters
  • match_phrase (Phrase) – the phrase that has been matched

  • match_variant (Phrase) – the variant of the phrase that the match is based on

  • match_string (str) – the text string that matches the variant phrase

  • match_offset (int) – the offset of the match string in the text

Returns

None

Return type

None

fuzzy_search.fuzzy_patterns module

fuzzy_search.fuzzy_patterns.context_before_pattern(name, pattern_definition, context_string, max_distance=10)
fuzzy_search.fuzzy_patterns.context_then_pattern(name, pattern_definition, context_string)
fuzzy_search.fuzzy_patterns.escape_string(string)
fuzzy_search.fuzzy_patterns.get_context_patterns(context_type: Union[None, str] = None) dict
fuzzy_search.fuzzy_patterns.get_search_patterns(pattern_type=None)
fuzzy_search.fuzzy_patterns.list_context_pattern_types(context_type=None)
fuzzy_search.fuzzy_patterns.list_pattern_definitions(pattern_type=None)
fuzzy_search.fuzzy_patterns.list_pattern_names(name_only=True, pattern_type=None)
fuzzy_search.fuzzy_patterns.make_search_context_patterns(context_string, pattern_names, context_patterns)
fuzzy_search.fuzzy_patterns.pattern_before_context(name, pattern_definition, context_string, max_distance=10)
fuzzy_search.fuzzy_patterns.pattern_comma_then_context(name, pattern_definition, context_string)

fuzzy_search.fuzzy_phrase module

class fuzzy_search.fuzzy_phrase.Phrase(phrase: Union[str, Dict[str, str]], ngram_size: int = 2, skip_size: int = 2, early_threshold: int = 3, late_threshold: int = 3, within_range_threshold: int = 3, ignorecase: bool = False)

Bases: object

add_max_offset(max_offset: int) None

Add a maximum offset for matching a phrase in a text.

Parameters

max_offset (int) – the maximum offset to allow a phrase to match

add_metadata(metadata_dict: Dict[str, any]) None

Add key/value pairs as metadata for this phrase.

Parameters

metadata_dict (Dict[str, any]) – a dictionary of key/value pairs as metadata

Returns

None

Return type

None

has_label(label_string: str) bool

Check if a given label belongs to at least one phrase in the phrase model.

Parameters

label_string (str) – a label string

Returns

a boolean whether the label is part of the phrase model

Return type

bool

has_skipgram(skipgram: str) bool

For a given skipgram, return boolean whether it is in the index

Parameters

skipgram (str) – an skipgram string

Returns

A boolean whether skipgram is in the index

Return type

bool

is_early_skipgram(skipgram: str) bool

For a given skipgram, return boolean whether it appears early in the phrase.

Parameters

skipgram (str) – an skipgram string

Returns

A boolean whether skipgram appears early in the phrase

Return type

bool

set_label(label: Union[str, List[str]]) None

Set the label(s) of a phrase. Labels must be string and can be a single string or a list.

Parameters

label (Union[str, List[str]]) – the label(s) of a phrase

skipgram_offsets(skipgram_string: str) Union[None, List[int]]

For a given skipgram return the list of offsets at which it appears.

Parameters

skipgram_string (str) – an skipgram string

Returns

A list of string offsets at which the skipgram appears

Return type

Union[None, List[int]]

within_range(skipgram1, skipgram2)
fuzzy_search.fuzzy_phrase.is_valid_label(label: Union[str, List[str]]) bool

Test whether label has a valid value.

Parameters

label (Union[str, List[str]]) – a phrase label (either a string or a list of strings)

Returns

whether the label is valid

Return type

bool

fuzzy_search.fuzzy_phrase_model module

class fuzzy_search.fuzzy_phrase_model.PhraseModel(phrases: Union[None, List[Union[str, Dict[str, Union[str, list]], fuzzy_search.fuzzy_phrase.Phrase]]] = None, variants: Union[None, List[Union[Dict[str, List[str]], fuzzy_search.fuzzy_phrase.Phrase]]] = None, phrase_labels: Union[None, List[Dict[str, str]]] = None, distractors: Union[None, List[Union[Dict[str, List[str]], fuzzy_search.fuzzy_phrase.Phrase]]] = None, model: Union[None, List[Dict[str, Union[str, list]]]] = None, custom: Union[None, List[Dict[str, Union[str, int, float, list]]]] = None, config: Optional[dict] = None)

Bases: object

add_custom(custom: List[Dict[str, Union[str, int, float, list]]]) None

Add custom key/value pairs to the entry as phrase metadata.

param entry: an Array of phrase dictionaries, each with a ‘phrase’ property and additional key/value pairs type entry: Dict[str, Union[str, int, float, list]]

add_distractor(distractor_phrase: fuzzy_search.fuzzy_phrase.Phrase, main_phrase: fuzzy_search.fuzzy_phrase.Phrase)

Add a phrase to the model as distractor of a given main phrase.

Parameters
  • distractor_phrase (Phrase) – a distractor phrase to be added as distractor of main_phrase

  • main_phrase (Phrase) – a main phrase that the distractor phrase is a distractor of

add_distractors(distractors: List[Dict[str, Union[str, List[str]]]], add_new_phrases: bool = True) None

Add distractors of a phrase. If the phrase is not registered, add it to the set. - input is a list of dictionaries: distractors = [{‘phrase’: ‘some phrase’, ‘distractors’: [‘some distractor’, ‘some other distractor’]}]

Parameters
  • distractors (List[Dict[str, Union[str, List[str]]]]) – a list of phrase dictionaries with ‘distractor’ property

  • add_new_phrases (bool) – a Boolean to indicate if unknown phrases should be added

add_labels(phrase_labels: List[Dict[str, Union[str, list]]])

Add a label to a phrase. This can be used to group phrases under the same label. - input is a list of phrase/label pair dictionaries: labels = [{‘phrase’: ‘some phrase’, ‘label’: ‘some label’}]

add_model(model: List[Union[str, Dict[str, Union[str, list]]]]) None

Add an entire model with list of phrase dictionaries.

Parameters

model (List[Union[str, Dict[str, Union[str list]]]]) – a list of phrase dictionaries

Returns

None

Return type

None

add_phrase(phrase: fuzzy_search.fuzzy_phrase.Phrase) None

Add a phrase to the model as main phrase.

Parameters

phrase (Phrase) – a phrase to be added

add_phrases(phrases: List[Union[str, Dict[str, Union[str, List[str]]], fuzzy_search.fuzzy_phrase.Phrase]]) None

Add a list of phrases to the phrase model. Phrases must be either: - a list of strings - a list of dictionaries with property ‘phrase’ and the phrase as a string value - a list of Phrase objects

Parameters

phrases (List[Union[str, Dict[str, Union[str, List[str]]]]]) – a list of phrases

add_variant(variant_phrase: fuzzy_search.fuzzy_phrase.Phrase, main_phrase: fuzzy_search.fuzzy_phrase.Phrase)

Add a phrase to the model as variant of a given main phrase.

Parameters
  • variant_phrase (Phrase) – a variant phrase to be added as variant of main_phrase

  • main_phrase (Phrase) – a main phrase that the variant phrase is a variant of

add_variants(variants: List[Dict[str, Union[str, List[str]]]], add_new_phrases: bool = True) None

Add variants of a phrase. If the phrase is not registered, add it to the set. - input is a list of dictionaries: variants = [{‘phrase’: ‘some phrase’, ‘variants’: [‘some variant’, ‘some other variant’]}]

Parameters
  • variants (List[Dict[str, Union[str, List[str]]]]) – a list of phrase dictionaries with ‘variant’ property

  • add_new_phrases (bool) – a Boolean to indicate if unknown phrases should be added

get(phrase_string: str, custom_property: str) any

Get the value of a custom property for a given phrase.

Parameters
  • phrase_string (str) – a phrase string of a registered phrase.

  • custom_property (str) – the name of a custom property of the registered phrase

Returns

the custom property of a given phrase

Return type

any

get_labels(phrase: Union[str, fuzzy_search.fuzzy_phrase.Phrase]) Set[str]

Return the label(s) of a registered phrase.

Parameters

phrase (Union[str, Phrase]) – a phrase string or object

Returns

a set of labels

Return type

List[str]

get_phrases() List[fuzzy_search.fuzzy_phrase.Phrase]

Return a list of all registered phrases.

Returns

a list of all registered phrases

Return type

List[Phrase]

get_phrases_by_max_length(max_length: int, include_variants: bool = False) Generator[fuzzy_search.fuzzy_phrase.Phrase, None, None]

Return all phrase in the phrase model that are no longer than a given length.

Parameters
  • max_length (int) – the maximum length of phrases to be returned

  • include_variants – whether to include variants

Returns

a generator that yield phrases

Return type

Generator[Phrase, None, None]

get_variants(phrases: Optional[List[str]] = None) List[Dict[str, Union[str, List[str]]]]

Return registered variants of a specific list of phrases or of all registered phrases (when no list of phrases is given).

Parameters

phrases (List[str]) – a list of registered phrase strings

Returns

a list of dictionaries of phrases and their variants

Return type

List[Dict[str, Union[str, List[str]]]]

has_custom(phrase_string: str, custom_property: str) bool

Check if a phrase has a given custom property.

Parameters
  • phrase_string (str) – a phrase string of a registered phrase.

  • custom_property (str) – the name of a custom property of the registered phrase

Returns

a boolean to indicate whether the phrase has a custom property of the given property name

Return type

bool

has_label(phrase_string: str) bool

Check if a registered phrase has a label.

Parameters

phrase_string (str) – a phrase string of a registered phrase

Returns

a boolean indicating if the registered phrase has a label

has_phrase(phrase: Union[str, Dict[str, any], fuzzy_search.fuzzy_phrase.Phrase]) bool

Check if phrase is registered in phrase_model.

Parameters

phrase (Union[str, Dict[str, any], Phrase]) – a phrase string

Returns

a boolean indicating whether phrase is registered

Return type

bool

index_phrase_words(phrase: fuzzy_search.fuzzy_phrase.Phrase) None

Index a phrase on its individual words, for exact match look up routines.

Parameters

phrase (Phrase) – a phrase object that is part of the phrase model

is_label(label: str) bool

Check if label is registered as label of any known phrase.

Parameters

label (str) – a label string to be checked

Returns

a boolean whether the label belongs to a known phrase

Return type

bool

property json: List[Dict[str, Union[str, List[str]]]]

Return a JSON representation of the phrase model.

Returns

a JSON respresentation of the phrase model

Return type

List[Dict[str, Union[str, List[str]]]]

remove_custom(custom: List[Dict[str, any]]) None

Remove custom properties for a list of phrases.

Parameters

custom (List[Dict[str, any]]) – a list of phrase dictionaries with custom properties to remove

remove_distractor(distractor_phrase: fuzzy_search.fuzzy_phrase.Phrase) None

Remove a distractor phrase from the model, including its connection to the phrase it is a distractor of.

Parameters

distractor_phrase (Phrase) – a phrase that is registered as a distractor of one or more main phrases

remove_distractors(distractors: Optional[List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]] = None, distractors_of_phrase: Union[None, str] = None)

Remove a list of distractors of a phrase. - distractors: a list of dictionaries with phrases as key and the list of distractors to be removed as values distractors = [{‘phrase’: ‘some phrase’, ‘distractors’: [‘distractor to remove’, ‘some other distractor’]}] - phrase: remove all distractors of a given phrase

Parameters
  • distractors (Union[List[Union[str, Phrase]], None]) – an optional list of phrase dictionaries with ‘distractors’ property

  • distractors_of_phrase (Union[str, None]) – an optional string of a registered phrase for which all distractors are removed

remove_labels(phrases: Union[List[fuzzy_search.fuzzy_phrase.Phrase], List[str]]) None

Remove labels for known phrases.

Parameters

phrases (Union[List[Phrase], List[str]]) – is a list of known phrases (either as Phrase objects or strings)

remove_phrase(phrase: fuzzy_search.fuzzy_phrase.Phrase)

Remove a main phrase from the model, including its connections to any variant and distractor phrases.

Parameters

phrase (Phrase) – a phrase that is registered as a main phrase

remove_phrase_words(phrase: fuzzy_search.fuzzy_phrase.Phrase) None

Remove the individual words of a phrase from the index. Only use this is you are removing the phrase from the phrase model.

Parameters

phrase (Phrase) – a phrase object that is part of the phrase model

remove_phrases(phrases: List[Union[str, Dict[str, Union[str, List[str]]], fuzzy_search.fuzzy_phrase.Phrase]])

Remove a list of phrases from the phrase model. If it has any registered spelling variants, remove those as well.

Parameters

phrases (List[Union[str, Dict[str, Union[str, List[str]]]]]) – a list of phrases/keyphrases

remove_variant(variant_phrase: fuzzy_search.fuzzy_phrase.Phrase) None

Remove a variant phrase from the model, including its connection to the phrase it is a variant of.

Parameters

variant_phrase (Phrase) – a phrase that is registered as a variant of one or more main phrases

remove_variants(variants: Optional[List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]] = None, variants_of_phrase: Optional[Union[str, fuzzy_search.fuzzy_phrase.Phrase]] = None)

Remove a list of spelling variants of a phrase.

Parameters
  • variants (Union[List[str, Phrase]], None]) – a list of variant strings or variant phrase objects to remove

  • variants_of_phrase (Union[str, Phrase, None]) – an optional phrase string or phrase object for which all variants are to be removed

validate_entry_phrase(entry: Dict[str, Union[str, int, float, list]]) None

Check if a given phrase (as dictionary) is registered.

Parameters

entry (Dict[str, Union[str, int, float, list]]) – a phrase dictionary with a ‘phrase’ property

variant_of(variant: Union[str, fuzzy_search.fuzzy_phrase.Phrase]) Union[None, fuzzy_search.fuzzy_phrase.Phrase]
variants(phrase: Union[str, fuzzy_search.fuzzy_phrase.Phrase]) Union[None, List[fuzzy_search.fuzzy_phrase.Phrase]]

Return all variants of a given phrase.

Parameters

phrase (Union[str, Phrase]) – a phrase string or phrase object

Returns

a list of variants of the phrase or None if it doesn’t have any

Type

Union[None, List[Phrase]]

fuzzy_search.fuzzy_phrase_model.as_phrase_object(phrase: Union[str, dict, fuzzy_search.fuzzy_phrase.Phrase], ngram_size: int = 2, skip_size: int = 2) fuzzy_search.fuzzy_phrase.Phrase
fuzzy_search.fuzzy_phrase_model.is_phrase_dict(phrase_dict: Dict[str, Union[str, List[str]]]) bool

fuzzy_search.fuzzy_phrase_searcher module

class fuzzy_search.fuzzy_phrase_searcher.FuzzyPhraseSearcher(config: Union[None, Dict[str, Union[str, int, float]]] = None)

Bases: object

configure(config: Dict[str, any]) None

Configure the fuzzy searcher with a given config object.

Parameters

config (Dict[str, Union[str, int, float]]) – a config dictionary

filter_matches_by_distractors(matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]
filter_matches_by_threshold(matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]
find_candidates(text: dict, use_word_boundaries: bool, include_variants: Union[None, bool] = None, known_word_offset: Optional[Dict[int, Dict[str, any]]] = None) List[fuzzy_search.fuzzy_match.Candidate]

Find candidate fuzzy matches for a given text.

Parameters
  • text (dict) – the text object to match with phrases

  • use_word_boundaries (bool) – use word boundaries in determining match boundaries

  • include_variants (bool) – boolean flag for whether to include phrase variants for finding matches

  • known_word_offset (Dict[int, Dict[str, any]]) – a dictionary of known words and their text offsets based on exact matches

Returns

a list of candidate matches

Return type

List[Candidate]

find_exact_matches(text: Union[str, Dict[str, str]], use_word_boundaries: Union[None, bool] = None, include_variants: Union[None, bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatch]

Find all fuzzy matching phrases for a given text.

Parameters
  • text (Union[str, Dict[str, str]]) – the text (string or dictionary with ‘text’ property) to find fuzzy matching phrases in.

  • use_word_boundaries (Union[None, bool]) – use word boundaries in determining match boundaries

  • include_variants (Union[None, bool]) – boolean flag for whether to include phrase variants for finding matches

Returns

a list of phrases matches

Return type

PhraseMatch

find_matches(text: Union[str, Dict[str, str]], use_word_boundaries: Union[None, bool] = None, allow_overlapping_matches: Union[None, bool] = None, include_variants: Union[None, bool] = None, filter_distractors: Union[None, bool] = None, skip_exact_matching: Optional[bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatch]

Find all fuzzy matching phrases for a given text. By default, a first pass of exact matching is conducted to find exact occurrences of phrases. This is to speed up the fuzzy matching pass

Parameters
  • text (Union[str, Dict[str, str]]) – the text (string or dictionary with ‘text’ property) to find fuzzy matching phrases in.

  • use_word_boundaries (Union[None, bool]) – use word boundaries in determining match boundaries

  • allow_overlapping_matches (Union[None, bool]) – boolean flag for whether to allow matches to overlap in their text ranges

  • include_variants (Union[None, bool]) – boolean flag for whether to include phrase variants for finding matches

  • filter_distractors (Union[None, bool]) – boolean flag for whether to remove phrase matches that better match distractors

  • skip_exact_matching (Union[None, bool]) – boolean flag whether to skip the exact matching step

Returns

a list of phrases matches

Return type

PhraseMatch

find_skipgram_matches(text: Dict[str, Union[str, int, float, list]], include_variants: Union[None, bool] = None, known_word_offset: Optional[Dict[int, Dict[str, any]]] = None) fuzzy_search.fuzzy_phrase_searcher.SkipMatches

Find all skipgram matches between text and phrases.

Parameters
  • text (Dict[str, Union[str, int, float, list]]) – the text object to match with phrases

  • include_variants (bool) – boolean flag for whether to include phrase variants for finding matches

  • known_word_offset (Dict[int, Dict[str, any]]) – a dictionary of known words and their text offsets based on exact matches

Returns

a SkipMatches object contain all skipgram matches

Return type

SkipMatches

index_distractors(distractors: List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]) None

Add a list of distractor phrases to filter out likely incorrect phrase matches.

Parameters

distractors (List[Union[str, Phrase]]) – a list of distractors, either as string or as Phrase objects

index_phrase_model(phrase_model: Union[List[Dict[str, Union[str, int, float, list]]], fuzzy_search.fuzzy_phrase_model.PhraseModel])

Add a phrase model to search for phrases in texts.

Parameters

phrase_model (Union[List[Dict[str, Union[str, int, float, list]]], PhraseModel]) – a phrase model, either as dictionary or as PhraseModel object

index_phrases(phrases: List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]) None

Add a list of phrases to search for in texts.

Parameters

phrases (List[Union[str, Phrase]]) – a list of phrases, either as string or as Phrase objects

index_variants(variants: List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]) None

Add a list of variant phrases to search for in texts.

Parameters

variants (List[Union[str, Phrase]]) – a list of variants, either as string or as Phrase objects

class fuzzy_search.fuzzy_phrase_searcher.SkipMatches(ngram_size: int, skip_size: int)

Bases: object

add_skip_match(skipgram: fuzzy_search.fuzzy_string.SkipGram, phrase: fuzzy_search.fuzzy_phrase.Phrase) None

Add a skipgram from a text that matches a phrase.

Parameters
  • skipgram (SkipGram) – a skipgram from a text

  • phrase (Phrase) – a phrase object that matches the skipgram

fuzzy_search.fuzzy_phrase_searcher.add_exact_match_score(match: fuzzy_search.fuzzy_match.PhraseMatch) fuzzy_search.fuzzy_match.PhraseMatch
fuzzy_search.fuzzy_phrase_searcher.candidates_to_matches(candidates: List[fuzzy_search.fuzzy_match.Candidate], text: dict, phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, ignorecase: bool = False) List[fuzzy_search.fuzzy_match.PhraseMatch]
fuzzy_search.fuzzy_phrase_searcher.filter_matches_by_overlap(filtered_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]
fuzzy_search.fuzzy_phrase_searcher.filter_overlapping_phrase_candidates(phrase_candidates: List[fuzzy_search.fuzzy_match.Candidate]) List[fuzzy_search.fuzzy_match.Candidate]
fuzzy_search.fuzzy_phrase_searcher.filter_skipgram_threshold(skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches, skip_threshold: float) List[fuzzy_search.fuzzy_phrase.Phrase]

Filter the skipgram matches based on the skipgram overlap threshold.

Parameters
  • skip_matches (SkipMatches) – the phrases that matches the text

  • skip_threshold (float) – the threshold for the skipgram overlap between a text and a phrase

Returns

the list of phrases with a skipgram overlap that meets the threshold

Return type

List[Phrase]

fuzzy_search.fuzzy_phrase_searcher.get_exact_match_ranges(exact_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[dict]
fuzzy_search.fuzzy_phrase_searcher.get_known_word_offsets(match_ranges: List[Dict[str, any]], text_doc: Dict[str, str]) Dict[int, dict]
fuzzy_search.fuzzy_phrase_searcher.get_skipmatch_candidates(text: Dict[str, any], skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches, skipgram_threshold: float, phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, max_length_variance: int = 1, ignorecase: bool = False) List[fuzzy_search.fuzzy_match.Candidate]

Find all candidate matches for the phrases in a SkipMatches object.

Parameters
  • text (Dict[str, any]) – the text object to match with phrases

  • skip_matches (SkipMatches) – a SkipMatches object with matches between a text and a list of phrases

  • skipgram_threshold (float) – a threshold for how many skipgrams should match between a phrase and a candidate

  • phrase_model (PhraseModel) – a phrase model, either as dictionary or as PhraseModel object

  • max_length_variance (int) – the maximum difference in length between candidate and phrase

  • ignorecase (bool) – whether to ignore case when matching skip grams

Returns

a list of candidate matches

Return type

List[Candidate]

fuzzy_search.fuzzy_phrase_searcher.get_skipmatch_phrase_candidates(text: Dict[str, any], phrase: fuzzy_search.fuzzy_phrase.Phrase, skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches, skipgram_threshold: float, max_length_variance: int = 1, ignorecase: bool = False) List[fuzzy_search.fuzzy_match.Candidate]

Find all candidate matches for a given phrase and SkipMatches object.

Parameters
  • text (Dict[str, any]) – the text object to match with phrases

  • phrase (Phrase) – a phrase to find candidate matches for

  • skip_matches (SkipMatches) – a Skipmatches object with matches between a text and a list of phrases

  • skipgram_threshold (float) – a threshold for how many skipgrams should match between a phrase and a candidate

  • max_length_variance (int) – the maximum difference in length between candidate and phrase

  • ignorecase (bool) – whether to ignore case when matching skip grams

Returns

a list of candidate matches

Return type

List[Candidate]

fuzzy_search.fuzzy_phrase_searcher.get_skipset_overlap(phrase: fuzzy_search.fuzzy_phrase.Phrase, skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches) float

Calculate the overlap between the set of skipgrams of a text and the skipgrams of a phrase.

Parameters
  • phrase (Phrase) – a phrase object that has been matched against a text

  • skip_matches (SkipMatches) – a SkipMatches object containing the skipgram matches between a text and a list of phrases

Returns

the fraction of skipgrams in the phrase that overlaps with the text

Return type

float

fuzzy_search.fuzzy_phrase_searcher.get_text_dict(text: Union[str, dict], ignorecase: bool = False) dict

Check that text is in a dictionary with an id property, so that passing a long text goes by reference instead of copying the long text string.

Parameters
  • text (Union[str, dict]) – a text string or text dictionary

  • ignorecase (bool) – boolean flag for whether to ignore case

Returns

a text dictionary with an id property

Return type

dict

fuzzy_search.fuzzy_phrase_searcher.index_known_word_offsets(exact_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) Dict[int, Dict[str, any]]
fuzzy_search.fuzzy_phrase_searcher.search_exact(phrase: fuzzy_search.fuzzy_phrase.Phrase, text: Dict[str, str], ignorecase: bool = False, use_word_boundaries: bool = True)
fuzzy_search.fuzzy_phrase_searcher.search_exact_phrases(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, text: Dict[str, str], ignorecase: bool = False, use_word_boundaries: bool = True, include_variants: bool = False)
fuzzy_search.fuzzy_phrase_searcher.search_exact_phrases_with_word_boundaries(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, text: Dict[str, str], ignorecase: bool = False, include_variants: bool = False)
fuzzy_search.fuzzy_phrase_searcher.search_exact_phrases_without_word_boundaries(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, text: Dict[str, str], ignorecase: bool = False, include_variants: bool = False)

fuzzy_search.fuzzy_searcher module

class fuzzy_search.fuzzy_searcher.FuzzySearcher(char_match_threshold=0.5, ngram_threshold=0.5, levenshtein_threshold=0.5, max_length_variance=1)

Bases: object

disable_strip_suffix()
enable_strip_suffix()
filter_candidates(candidates, keyword, ngram_size=2)
filter_char_match_candidates(candidates, match_term)
filter_levenshtein_candidates(candidates, match_term)
filter_ngram_candidates(candidates, match_term, ngram_size)
find_candidates(text, keyword, ngram_size=2, use_word_boundaries=False)

Find candidate matches that start with the same initial character as the search term and filter them based on default thresholds for character overlap, ngram overlap and levenshtein distance.

find_start_candidates(text, term, use_word_boundaries)

Find candidate matches that start with the same initial character as the search term.

find_term_matches(text, term, max_length_variance=None, use_word_boundaries=False)
make_ngrams(term, n)
rank_candidates(candidates, keyword, ngram_size=2)
score_char_overlap(term1, term2)
score_char_overlap_ratio(term1, term2)
score_levenshtein_distance(s1, s2)
score_levenshtein_distance_ratio(term1, term2)
score_ngram_overlap(term1, term2, ngram_size)
score_ngram_overlap_ratio(term1, term2, ngram_size)
strip_suffix(match)
fuzzy_search.fuzzy_searcher.create_term_match(re_match, term)

fuzzy_search.fuzzy_string module

class fuzzy_search.fuzzy_string.SkipGram(skipgram_string: str, offset: int, skipgram_length: int)

Bases: object

fuzzy_search.fuzzy_string.get_non_word_prefix(string: str) str

Check if a string has a non-word prefix and return it.

Parameters

string (str) – the string from which the prefix is to be return

Returns

the non-word prefix

Return type

str

fuzzy_search.fuzzy_string.get_non_word_suffix(string: str) str

Check if a string has a non-word suffix and return it.

Parameters

string (str) – the string from which the suffix is to be return

Returns

the non-word suffix

Return type

str

fuzzy_search.fuzzy_string.insert_skips(window: str, skipgram_combinations: List[List[int]])

For a given skip gram window, return all skip grams for a given configuration.

fuzzy_search.fuzzy_string.make_ngrams(text: str, n: int) List[str]

Turn a term string into a list of ngrams of size n

Parameters
  • text (str) – a text string

  • n (int) – the ngram size

Returns

a list of ngrams

Return type

List[str]

fuzzy_search.fuzzy_string.score_char_overlap(term1: str, term2: str) int

Count the number of overlapping character tokens in two strings.

Parameters
  • term1 (str) – a term string

  • term2 (str) – a term string

Returns

the number of overlapping ngrams

Return type

int

fuzzy_search.fuzzy_string.score_char_overlap_ratio(term1, term2)

Score the number of overlapping characters between two terms as proportion of the length of the first term

Parameters
  • term1 (str) – a term string

  • term2 (str) – a term string

Returns

the number of overlapping ngrams

Return type

int

fuzzy_search.fuzzy_string.score_levenshtein_distance(term1: str, term2: str) int

Calculate Levenshtein distance between two string.

Parameters
  • term1 (str) – a term string

  • term2 (str) – a term string

Returns

the number of overlapping ngrams

Return type

int

fuzzy_search.fuzzy_string.score_levenshtein_similarity_ratio(term1, term2)

Score the levenshtein similarity between two terms

Parameters
  • term1 (str) – a term string

  • term2 (str) – a term string

Returns

the number of overlapping ngrams

Return type

int

fuzzy_search.fuzzy_string.score_ngram_overlap(term1: str, term2: str, ngram_size: int)

Score the number of overlapping ngrams between two terms

Parameters
  • term1 (str) – a first term string

  • term2 (str) – a second term string

  • ngram_size (int) – the character ngram size

Returns

the number of overlapping ngrams

Return type

int

fuzzy_search.fuzzy_string.score_ngram_overlap_ratio(term1, term2, ngram_size)

Score the number of overlapping ngrams between two terms as proportion of the length of the first term

Parameters
  • term1 (str) – a term string

  • term2 (str) – a term string

  • ngram_size (int) – the character ngram size

Returns

the number of overlapping ngrams

Return type

int

fuzzy_search.fuzzy_string.strip_prefix(string: str) str

Strip non-word prefix from string ending.

Parameters

string (str) – the string from which the prefix is to be stripped

Returns

the stripped string

Return type

str

fuzzy_search.fuzzy_string.strip_suffix(string: str) str

Strip non-word suffix from string ending.

Parameters

string (str) – the string from which the suffix is to be stripped

Returns

the stripped string

Return type

str

fuzzy_search.fuzzy_string.text2skipgrams(text: str, ngram_size: int = 2, skip_size: int = 2) Generator[fuzzy_search.fuzzy_string.SkipGram, None, None]

Turn a text string into a list of skipgrams.

Parameters
  • text (str) – an text string

  • ngram_size (int) – an integer indicating the number of characters in the ngram

  • skip_size (int) – an integer indicating how many skip characters in the ngrams

Returns

An iterator returning tuples of skip_gram and offset

Return type

Generator[tuple]

fuzzy_search.fuzzy_template module

class fuzzy_search.fuzzy_template.FuzzyTemplate(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, template_json: Union[List[str], List[dict], Dict[str, Union[str, dict]]], ignore_unknown: bool = False, ordered: bool = False)

Bases: object

get_element(element_label: str) Union[None, fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement, fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement]

Return the element corresponding to a given label.

Parameters

element_label (str) – a fuzzy element label

Returns

the element corresponding to the label or None if label is unknown

Return type

Union[FuzzyTemplateElement]

get_elements_by_cardinality(cardinality: str = 'single') List[fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement]

Return all template elements with a given cardinality.

Parameters

cardinality (str) – a cardinality type (‘single’ or ‘multi’)

Returns

the list of labels of elements with a given cardinality

Return type

List[str]

get_label_phrases(label: str) List[fuzzy_search.fuzzy_phrase.Phrase]

Return a list of phrases that have a given label.

Parameters

label (str) – a phrase label for phrases in the registered phrase_model

Returns

a list of phrases from the registered phrase model that have a given phrase

Return type

List[Phrase]

get_labels_by_cardinality(cardinality: str = 'single') List[str]

Return the labels of all template elements with a given cardinality.

Parameters

cardinality (str) – a cardinality type (‘single’ or ‘multi’)

Returns

the list of labels of elements with a given cardinality

Return type

List[str]

get_required_elements() List[fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement]

Return all required elements in the template.

Returns

the list of labels of required elements

Return type

List[FuzzyTemplateElement]

get_required_labels() List[str]

Return the labels of all required elements in the template.

Returns

the list of labels of required elements

Return type

List[str]

has_group(group: str) bool

Check if the template has group elements with a given group name.

Parameters

group (str) – a fuzzy element group

Returns

whether the group corresponds to any registered element(s)

Return type

bool

has_label(label: Union[str, List[str]]) bool

Check if the template has label elements with a given label or list of label (any or all).

Parameters

label (Union[str, List[str]]) – a fuzzy element label

Returns

whether the label corresponds to any registered element(s)

Return type

bool

parse_group_element(group_info: Dict[str, any]) fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement

Parse a group element dictionary/JSON object into a fuzzy template group element.

Parameters

group_info (dict) – a dictionary containing the properties of the template group element

Returns

a fuzzy template group element

Return type

FuzzyTemplateGroupElement

parse_label_element(label_info: Union[str, Dict[str, any]]) Optional[fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement]

Parse a label element dictionary/JSON object into a fuzzy template label element.

Parameters

label_info (dict) – a dictionary containing the properties of the template label element

Returns

a fuzzy template label element, or None if the label is not used in the phrase model

Return type

FuzzyTemplateLabelElement

register_template(template_json: Union[List[str], List[dict], Dict[str, Union[str, dict]]]) None

Register a list of elements as a fuzzy template. Each element contains a label that corresponds to at least one phrase in the registered phrase model.

Parameters

template_json (Union[List[str], List[dict], Dict[str, Union[str, dict]]]) – a dictionary of template groups or elements to be registered as part of the template

class fuzzy_search.fuzzy_template.FuzzyTemplateElement(label: Union[None, str, List[str]], element_type: str, required: bool)

Bases: object

class fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement(elements: List[fuzzy_search.fuzzy_template.FuzzyTemplateElement], label: Union[None, str] = None, ordered: bool = True, required: bool = False)

Bases: fuzzy_search.fuzzy_template.FuzzyTemplateElement

class fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement(label: str, required: bool = False, cardinality: str = 'single', next_label: Union[None, str, List[str]] = None, next_distance_max: Union[None, int] = None, variable: bool = False)

Bases: fuzzy_search.fuzzy_template.FuzzyTemplateElement

fuzzy_search.fuzzy_template.generate_group_from_json(element_info: dict, group_elements: List[fuzzy_search.fuzzy_template.FuzzyTemplateElement]) fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement

Generate a FuzzyTemplateGroupElement from a element json dictionary and a list of group elements.

Parameters
  • element_info (dict) – a dictionary containing the properties of the template group element

  • group_elements (List[FuzzyTemplateElement]) – a list of fuzzy template elements that are part of the group element

Returns

a fuzzy template group element

Return type

FuzzyTemplateGroupElement

fuzzy_search.fuzzy_template.generate_label_from_json(label: str, element_info: dict) fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement

Generate a FuzzyTemplateLabelElement from a label and an element json dictionary.

Parameters
  • label (str) – the label string for the label element

  • element_info (dict) – a dictionary containing the properties of the template label element

Returns

a fuzzy template label element

Return type

FuzzyTemplateLabelElement

fuzzy_search.fuzzy_template.validate_element_properties(label: str, required: bool = False, cardinality: str = 'multi', next_label: Union[None, str, List[str]] = None, next_distance_max: Union[None, int] = None, variable: bool = False) None

Validate the properties of a FuzzyTemplate element.

Parameters
  • label (Union[str, List[str]]) – the label of the element, which can be a single string or a list of strings

  • required (bool) – whether or not the element must match for the template to match

  • cardinality (str) – whether the element can occur only once (default) or multiple times in a template match.

  • next_label (Union[str, List[str]]) – what the label of the next element should be. Use a list of labels for multiple options.

  • next_distance_max (int) – the maximum distance allowed between this element and the next element in the template

  • variable (bool) – flag to indicate the element has no phrases but has variable text (default is False)

fuzzy_search.fuzzy_template_searcher module

class fuzzy_search.fuzzy_template_searcher.FuzzyTemplateSearcher(template: Union[None, fuzzy_search.fuzzy_template.FuzzyTemplate] = None, config: Optional[dict] = None)

Bases: fuzzy_search.fuzzy_context_searcher.FuzzyContextSearcher

filter_phrase_matches(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]

Filter a list of phrase matches to only include phrase matches that have at least one label in common with the template.

Parameters

phrase_matches (List[PhraseMatch]) – a list of phrase matches

Returns

a filtered list of phrases matches

Return type

List[PhraseMatch]

find_template_matches(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_template_searcher.TemplateMatch]

Find all the matches that fit a template. The method returns a list of template matches, where each template match contains the phrase match that fit the template. There can be multiple template matches, if the phrase matches fit a template multiple times.

Parameters

phrase_matches (List[PhraseMatch]) – a list of phrase matches

Returns

a list of template matches

Return type

List[TemplateMatch]

search_text(text: Union[str, Dict[str, str]]) List[fuzzy_search.fuzzy_template_searcher.TemplateMatch]

Search phrases from the registered template’s phrase model in the text and check if the resulting matches together match the template. This method returns a dictionary including the individual phrase matches and any template matches.

Parameters

text (Union[str, Dict[str, str]]) – a text to search in, either as a string or a dictionary with text and an identifier

Returns

a dictionary with all phrase matches and template matches

Return type

Dict[str, Union[List[PhraseMatch], List[TemplateMatch]]]

set_template(template: fuzzy_search.fuzzy_template.FuzzyTemplate) None

Set a new template for the searcher and index the corresponding phrase model.

Parameters

template (FuzzyTemplate) – a fuzzy template to use for searching

class fuzzy_search.fuzzy_template_searcher.TemplateMatch(template: fuzzy_search.fuzzy_template.FuzzyTemplate, phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_sequence: Dict[str, any])

Bases: object

fuzzy_search.fuzzy_template_searcher.find_next_element_end_index(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_element: fuzzy_search.fuzzy_template.FuzzyTemplateElement, element_start_index: int) int

Find the next phrase match that doesn’t match a template element, from a given starting point in a list of phrase matches.

Parameters
  • phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element

  • template_element (FuzzyTemplateElement) – a template element to test the phrase matches against

  • element_start_index (int) – the index in the phrase list where the template elements first matches the template

Returns

the index in the phrase list where the template element stops matching

Return type

int

fuzzy_search.fuzzy_template_searcher.find_next_element_start_index(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_element: fuzzy_search.fuzzy_template.FuzzyTemplateElement, template_start_index: int) int

Find the next phrase match that matches a template element, from a given starting point in a list of phrase matches.

Parameters
  • phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element

  • template_element (FuzzyTemplateElement) – a template element to test the phrase matches against

  • template_start_index (int) – the index in the phrase list to start the matching process

Returns

the index in the phrase list where the template element matches

Return type

int

fuzzy_search.fuzzy_template_searcher.find_next_group_match_sequence(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_group: fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement, template_start_index: int) Union[None, Dict[str, any]]

Find the next sequence of phrase matches that match a template group element, from a given starting point in the list of phrase matches. This function returns None if the template doesn’t match.

Parameters
  • phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element

  • template_group (FuzzyTemplateGroupElement) – a template group element to test the phrase matches against

  • template_start_index (int) – the index in the phrase list to start the matching process

Returns

a sequence with start and end indexes in the list of phrase matches that match the template group

Return type

Union[None, Dict[str, any]]

fuzzy_search.fuzzy_template_searcher.find_next_ordered_group_match_sequence(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_group: fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement, template_start_index: int) Union[None, Dict[str, any]]

Find the next sequence of phrase matches that match an ordered template group element, from a given starting point in the list of phrase matches. This function returns None if the template doesn’t match.

Parameters
  • phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element

  • template_group (FuzzyTemplateGroupElement) – a template group element to test the phrase matches against

  • template_start_index (int) – the index in the phrase list to start the matching process

Returns

a sequence with start and end indexes in the list of phrase matches that match the template group

Return type

Union[None, Dict[str, any]]

fuzzy_search.fuzzy_template_searcher.find_next_unordered_group_match_sequence(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_group: fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement, template_start_index: int) Union[None, Dict[str, any]]

Find the next sequence of phrase matches that match an unordered template group element, from a given starting point in the list of phrase matches. This function returns None if the template doesn’t match.

Parameters
  • phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element

  • template_group (FuzzyTemplateGroupElement) – a template group element to test the phrase matches against

  • template_start_index (int) – the index in the phrase list to start the matching process

Returns

a sequence with start and end indexes in the list of phrase matches that match the template group

Return type

Union[None, Dict[str, any]]

fuzzy_search.fuzzy_template_searcher.get_phrase_match_list_labels(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[str]

Return a list of all the labels of a list of phrase matches.

Parameters

phrase_matches (List[PhraseMatch]) – a list of phrase matches

Returns

a list of phrase match labels

Return type

List[str]

fuzzy_search.fuzzy_template_searcher.get_sequence_label_element_matches(template_sequence: Dict[str, any]) List[Dict[str, any]]
fuzzy_search.fuzzy_template_searcher.has_required_matches(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template: fuzzy_search.fuzzy_template.FuzzyTemplate) bool

Check if list of phrase matches contain all required labels of a template.

Parameters
  • phrase_matches (List[PhraseMatch]) – a list of phrase matches

  • template (FuzzyTemplate) – a fuzzy template to use for searching

Returns

a True value only if all required labels have at least one match

fuzzy_search.fuzzy_template_searcher.initialize_sequence(element: fuzzy_search.fuzzy_template.FuzzyTemplateElement, start_index: int, end_index: int) Dict[str, any]
fuzzy_search.fuzzy_template_searcher.share_label(object1: Union[fuzzy_search.fuzzy_match.PhraseMatch, fuzzy_search.fuzzy_template.FuzzyTemplateElement], object2: Union[fuzzy_search.fuzzy_match.PhraseMatch, fuzzy_search.fuzzy_template.FuzzyTemplateElement]) bool

Check if two fuzzy objects (phrase matches of template elements) share at least one label.

Parameters
Returns

boolean value indicating that the two objects share a label

Return type

bool