fuzzy_search package
Submodules
fuzzy_search.fuzzy_config module
fuzzy_search.fuzzy_context_searcher module
- class fuzzy_search.fuzzy_context_searcher.FuzzyContextSearcher(config: Optional[dict] = None)
Bases:
fuzzy_search.fuzzy_phrase_searcher.FuzzyPhraseSearcher- add_match_context(match: fuzzy_search.fuzzy_match.PhraseMatch, text: Union[str, dict], context_size: Union[None, int] = None, prefix_size: Union[None, int] = None, suffix_size: Union[None, int] = None) fuzzy_search.fuzzy_match.PhraseMatchInContext
Add context to a given match and its corresponding text document.
- Parameters
match (PhraseMatch) – a phrase match object
text (Union[str, dict]) – the text that the match was taken from
context_size (int) – the size of the pre- and suffix window
prefix_size (Union[None, int]) – size of the prefix context
suffix_size (Union[None, int]) – size of the suffix context
- Returns
the phrase match object with context
- Return type
- configure_context(config: dict) None
Configure the context searcher.
- Parameters
config (dict) – a dictionary with configuration parameters to override the defaults
- find_matches(text: Union[str, dict], use_word_boundaries: Union[None, bool] = None, allow_overlapping_matches: bool = True, include_variants: Optional[bool] = None, filter_distractors: Optional[bool] = None, prefix_size: Union[None, int] = None, suffix_size: Union[None, int] = None, skip_exact_matching: Optional[bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatchInContext]
Find fuzzy matches for registered phrases and add context around match string. This extends the find_matches function of the FuzzyPhraseSearcher by adding local context to each match.
- Parameters
text (Union[str, Dict[str, str]]) – the text (string or dictionary with ‘text’ property) to find fuzzy matching phrases in.
use_word_boundaries (bool) – use word boundaries in determining match boundaries
allow_overlapping_matches (bool) – boolean flag for whether to allow matches to overlap in their text ranges
include_variants (bool) – boolean flag for whether to include phrase variants for finding matches
filter_distractors (bool) – boolean flag for whether to remove phrase matches that better match distractors
prefix_size (Union[None, int]) – the size of the prefix context window
suffix_size (Union[None, int]) – the size of the suffix context window
skip_exact_matching (Union[None, bool]) – boolean flag whether to skip the exact matching step
- Returns
a list of phrases matches with text surrounding the match string
- Return type
- find_matches_in_context(match_in_context: fuzzy_search.fuzzy_match.PhraseMatchInContext, use_word_boundaries: Union[None, bool] = None, include_variants: Union[None, bool] = None, filter_distractors: Union[None, bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatch]
Use a MatchInContext object to find other phrases in the context of that match.
- Parameters
match_in_context (PhraseMatchInContext) – a match phrase with context from the text that the match was taken from
use_word_boundaries (bool) – boolean whether to adjust match strings to word boundaries
include_variants (bool) – boolean whether to include variants of phrases in matching
filter_distractors (bool) – boolean whether to remove matches that are closer to distractors
- Returns
a list of match objects
- Return type
List[PhraseMatch]
fuzzy_search.fuzzy_match module
- class fuzzy_search.fuzzy_match.Candidate(phrase: fuzzy_search.fuzzy_phrase.Phrase, max_length_variance: int = 1, ignorecase: bool = False)
Bases:
object- add_skip_match(skipgram: fuzzy_search.fuzzy_string.SkipGram) None
Add a skipgram match between a text and a phrase ot the candidate.
- Parameters
skipgram (SkipGram) – a matching skipgram
- get_match_start_offset() Union[None, int]
Calculate the start offset of the match.
- Returns
the start offset of the match
- Return type
int
- get_match_string(text: Dict[str, any]) Optional[str]
Find the matching string of a candidate fuzzy match between a text and a phrase.
- Parameters
text (Dict[str, any]) – the text object from which the candidate was derived
- Returns
the matching string
- Return type
str
- get_skip_count_overlap() float
Calculate deviation of candidate skipgrams from phrase skipgrams.
- Returns
the skipgram overlap (-inf, 1.0]
- Return type
float
- get_skip_set_overlap() float
Calculate and set skipgram overlap between text and phrase skipgram matches.
- Returns
the skipgram overlap
- Return type
float
- is_match(skipgram_threshold: float)
Check if the candidate is a likely match for its corresponding phrase.
- Parameters
skipgram_threshold (float) – the threshold to for how many skipgrams have to match between candidate and phrase
- Returns
a boolean whether this candidate is a likely match for the phrase
- Return type
bool
- remove_first_skip() None
Remove the first matching skipgram from the list and update the count and set.
- same_candidate(other: fuzzy_search.fuzzy_match.Candidate)
Check if this candidate has the same start and end offsets as another candidate.
- Parameters
other (Candidate) – another candidate for the same phrase and text.
- Returns
this candidate match has the same offsets as the other candidate
- Return type
bool
- shift_start_skip() bool
Check if there is a later skip that is a better start.
- skip_match_length() int
Return the length of the matching string.
- Returns
difference between start and end offset
- Return type
int
- class fuzzy_search.fuzzy_match.PhraseMatch(match_phrase: fuzzy_search.fuzzy_phrase.Phrase, match_variant: fuzzy_search.fuzzy_phrase.Phrase, match_string: str, match_offset: int, ignorecase: bool = False, text_id: Union[None, str] = None, match_scores: Optional[dict] = None, match_label: Optional[Union[str, List[str]]] = None, match_id: Optional[str] = None)
Bases:
object- add_scores(skipgram_overlap: Union[None, float] = None) None
Compute overlap and similarity scores between the match variant and the match string and add these to the match object.
- Parameters
skipgram_overlap (Union[float, None]) – the overlap in skipgrams between match string and match variant
- Returns
None
- Return type
None
- as_web_anno() Dict[str, any]
Turn match object into a W3C Web Annotation representation
- has_label(label: str)
- json() dict
- property label_list: List[str]
- overlaps(other: fuzzy_search.fuzzy_match.PhraseMatch) bool
Check if the match string of this match object overlaps with the match string of another match object.
- Parameters
other (PhraseMatch) – another match object
- Returns
a boolean indicating whether the match_strings of the two objects overlap in the source text
- Return type
bool
- score_character_overlap()
Return the character overlap between the variant phrase_string and the match_string
- Returns
the character overlap as proportion of the variant phrase string
- Return type
float
- score_levenshtein_similarity()
Return the levenshtein similarity between the variant phrase_string and the match_string
- Returns
the levenshtein similarity as proportion of the variant phrase string
- Return type
float
- score_ngram_overlap() float
Return the ngram overlap between the variant phrase_string and the match_string
- Returns
the ngram overlap as proportion of the variant phrase string
- Return type
float
- class fuzzy_search.fuzzy_match.PhraseMatchInContext(match: fuzzy_search.fuzzy_match.PhraseMatch, text: Optional[Union[str, dict]] = None, context: Optional[str] = None, context_start: Optional[int] = None, context_end: Optional[int] = None, prefix_size: int = 20, suffix_size: int = 20)
Bases:
fuzzy_search.fuzzy_match.PhraseMatch- as_web_anno() Dict[str, any]
Turn match object into a W3C Web Annotation representation
- json()
- fuzzy_search.fuzzy_match.adjust_match_end_offset(phrase_string: str, candidate_string: str, text: Dict[str, any], end_offset: int, punctuation: str) Optional[int]
Adjust the end offset if it is not at a word boundary.
- Parameters
phrase_string (str) – the phrase string
candidate_string (str) – the candidate match string
text (Dict[str, any]) – the text object that contains the candidate match string
end_offset (int) – the text offset of the candidate match string
punctuation (str) – the set of characters to treat as punctuation
- Returns
the adjusted offset or None if the required adjustment is too big
- Return type
Union[int, None]
- fuzzy_search.fuzzy_match.adjust_match_offsets(phrase_string: str, candidate_string: str, text: Dict[str, any], candidate_start_offset: int, candidate_end_offset: int, punctuation: str = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~') Optional[Dict[str, Union[str, int]]]
Adjust the end offset if it is not at a word boundary.
- Parameters
phrase_string (str) – the phrase string
candidate_string (str) – the candidate match string
text (Dict[str, any]) – the text object that contains the candidate match string
candidate_start_offset (int) – the text offset of the start of the candidate match string
candidate_end_offset (int) – the text offset of the end of the candidate match string
punctuation (str) – the set of characters to treat as punctuation (defaults to string.punctuation)
- Returns
the adjusted offset or None if the required adjustment is too big
- Return type
Union[int, None]
- fuzzy_search.fuzzy_match.adjust_match_start_offset(text: Dict[str, any], match_string: str, match_offset: int) Optional[int]
Adjust the start offset if it is not at a word boundary.
- Parameters
text (Dict[str, any]) – the text object that contains the candidate match string
match_string (str) – the candidate match string
match_offset (int) – the text offset of the candidate match string
- Returns
the adjusted offset or None if the required adjustment is too big
- Return type
Union[int, None]
- fuzzy_search.fuzzy_match.calculate_end_shift(phrase_end: str, match_end: str, text_suffix: str, end_offset: int)
- fuzzy_search.fuzzy_match.map_string(affix_string: str, punctuation: str, whitespace_only: bool = False) str
Turn affix string into type char representation. Types are ‘w’ for non-whitespace char, and ‘s’ for whitespace char.
- Parameters
affix_string – a string
punctuation (str) – the set of characters to treat as punctuation
whitespace_only (bool) – whether to treat only whitespace as word boundary or also include (some) punctuation
- Type
str
- Returns
the type char representation
- Return type
str
- fuzzy_search.fuzzy_match.phrase_match_from_json(match_json: dict) fuzzy_search.fuzzy_match.PhraseMatch
- fuzzy_search.fuzzy_match.validate_match_props(match_phrase: fuzzy_search.fuzzy_phrase.Phrase, match_variant: fuzzy_search.fuzzy_phrase.Phrase, match_string: str, match_offset: int) None
Validate match properties.
- Parameters
- Returns
None
- Return type
None
fuzzy_search.fuzzy_patterns module
- fuzzy_search.fuzzy_patterns.context_before_pattern(name, pattern_definition, context_string, max_distance=10)
- fuzzy_search.fuzzy_patterns.context_then_pattern(name, pattern_definition, context_string)
- fuzzy_search.fuzzy_patterns.escape_string(string)
- fuzzy_search.fuzzy_patterns.get_context_patterns(context_type: Union[None, str] = None) dict
- fuzzy_search.fuzzy_patterns.get_search_patterns(pattern_type=None)
- fuzzy_search.fuzzy_patterns.list_context_pattern_types(context_type=None)
- fuzzy_search.fuzzy_patterns.list_pattern_definitions(pattern_type=None)
- fuzzy_search.fuzzy_patterns.list_pattern_names(name_only=True, pattern_type=None)
- fuzzy_search.fuzzy_patterns.make_search_context_patterns(context_string, pattern_names, context_patterns)
- fuzzy_search.fuzzy_patterns.pattern_before_context(name, pattern_definition, context_string, max_distance=10)
- fuzzy_search.fuzzy_patterns.pattern_comma_then_context(name, pattern_definition, context_string)
fuzzy_search.fuzzy_phrase module
- class fuzzy_search.fuzzy_phrase.Phrase(phrase: Union[str, Dict[str, str]], ngram_size: int = 2, skip_size: int = 2, early_threshold: int = 3, late_threshold: int = 3, within_range_threshold: int = 3, ignorecase: bool = False)
Bases:
object- add_max_offset(max_offset: int) None
Add a maximum offset for matching a phrase in a text.
- Parameters
max_offset (int) – the maximum offset to allow a phrase to match
- add_metadata(metadata_dict: Dict[str, any]) None
Add key/value pairs as metadata for this phrase.
- Parameters
metadata_dict (Dict[str, any]) – a dictionary of key/value pairs as metadata
- Returns
None
- Return type
None
- has_label(label_string: str) bool
Check if a given label belongs to at least one phrase in the phrase model.
- Parameters
label_string (str) – a label string
- Returns
a boolean whether the label is part of the phrase model
- Return type
bool
- has_skipgram(skipgram: str) bool
For a given skipgram, return boolean whether it is in the index
- Parameters
skipgram (str) – an skipgram string
- Returns
A boolean whether skipgram is in the index
- Return type
bool
- is_early_skipgram(skipgram: str) bool
For a given skipgram, return boolean whether it appears early in the phrase.
- Parameters
skipgram (str) – an skipgram string
- Returns
A boolean whether skipgram appears early in the phrase
- Return type
bool
- set_label(label: Union[str, List[str]]) None
Set the label(s) of a phrase. Labels must be string and can be a single string or a list.
- Parameters
label (Union[str, List[str]]) – the label(s) of a phrase
- skipgram_offsets(skipgram_string: str) Union[None, List[int]]
For a given skipgram return the list of offsets at which it appears.
- Parameters
skipgram_string (str) – an skipgram string
- Returns
A list of string offsets at which the skipgram appears
- Return type
Union[None, List[int]]
- within_range(skipgram1, skipgram2)
- fuzzy_search.fuzzy_phrase.is_valid_label(label: Union[str, List[str]]) bool
Test whether label has a valid value.
- Parameters
label (Union[str, List[str]]) – a phrase label (either a string or a list of strings)
- Returns
whether the label is valid
- Return type
bool
fuzzy_search.fuzzy_phrase_model module
- class fuzzy_search.fuzzy_phrase_model.PhraseModel(phrases: Union[None, List[Union[str, Dict[str, Union[str, list]], fuzzy_search.fuzzy_phrase.Phrase]]] = None, variants: Union[None, List[Union[Dict[str, List[str]], fuzzy_search.fuzzy_phrase.Phrase]]] = None, phrase_labels: Union[None, List[Dict[str, str]]] = None, distractors: Union[None, List[Union[Dict[str, List[str]], fuzzy_search.fuzzy_phrase.Phrase]]] = None, model: Union[None, List[Dict[str, Union[str, list]]]] = None, custom: Union[None, List[Dict[str, Union[str, int, float, list]]]] = None, config: Optional[dict] = None)
Bases:
object- add_custom(custom: List[Dict[str, Union[str, int, float, list]]]) None
Add custom key/value pairs to the entry as phrase metadata.
param entry: an Array of phrase dictionaries, each with a ‘phrase’ property and additional key/value pairs type entry: Dict[str, Union[str, int, float, list]]
- add_distractor(distractor_phrase: fuzzy_search.fuzzy_phrase.Phrase, main_phrase: fuzzy_search.fuzzy_phrase.Phrase)
Add a phrase to the model as distractor of a given main phrase.
- add_distractors(distractors: List[Dict[str, Union[str, List[str]]]], add_new_phrases: bool = True) None
Add distractors of a phrase. If the phrase is not registered, add it to the set. - input is a list of dictionaries: distractors = [{‘phrase’: ‘some phrase’, ‘distractors’: [‘some distractor’, ‘some other distractor’]}]
- Parameters
distractors (List[Dict[str, Union[str, List[str]]]]) – a list of phrase dictionaries with ‘distractor’ property
add_new_phrases (bool) – a Boolean to indicate if unknown phrases should be added
- add_labels(phrase_labels: List[Dict[str, Union[str, list]]])
Add a label to a phrase. This can be used to group phrases under the same label. - input is a list of phrase/label pair dictionaries: labels = [{‘phrase’: ‘some phrase’, ‘label’: ‘some label’}]
- add_model(model: List[Union[str, Dict[str, Union[str, list]]]]) None
Add an entire model with list of phrase dictionaries.
- Parameters
model (List[Union[str, Dict[str, Union[str list]]]]) – a list of phrase dictionaries
- Returns
None
- Return type
None
- add_phrase(phrase: fuzzy_search.fuzzy_phrase.Phrase) None
Add a phrase to the model as main phrase.
- Parameters
phrase (Phrase) – a phrase to be added
- add_phrases(phrases: List[Union[str, Dict[str, Union[str, List[str]]], fuzzy_search.fuzzy_phrase.Phrase]]) None
Add a list of phrases to the phrase model. Phrases must be either: - a list of strings - a list of dictionaries with property ‘phrase’ and the phrase as a string value - a list of Phrase objects
- Parameters
phrases (List[Union[str, Dict[str, Union[str, List[str]]]]]) – a list of phrases
- add_variant(variant_phrase: fuzzy_search.fuzzy_phrase.Phrase, main_phrase: fuzzy_search.fuzzy_phrase.Phrase)
Add a phrase to the model as variant of a given main phrase.
- add_variants(variants: List[Dict[str, Union[str, List[str]]]], add_new_phrases: bool = True) None
Add variants of a phrase. If the phrase is not registered, add it to the set. - input is a list of dictionaries: variants = [{‘phrase’: ‘some phrase’, ‘variants’: [‘some variant’, ‘some other variant’]}]
- Parameters
variants (List[Dict[str, Union[str, List[str]]]]) – a list of phrase dictionaries with ‘variant’ property
add_new_phrases (bool) – a Boolean to indicate if unknown phrases should be added
- get(phrase_string: str, custom_property: str) any
Get the value of a custom property for a given phrase.
- Parameters
phrase_string (str) – a phrase string of a registered phrase.
custom_property (str) – the name of a custom property of the registered phrase
- Returns
the custom property of a given phrase
- Return type
any
- get_labels(phrase: Union[str, fuzzy_search.fuzzy_phrase.Phrase]) Set[str]
Return the label(s) of a registered phrase.
- Parameters
phrase (Union[str, Phrase]) – a phrase string or object
- Returns
a set of labels
- Return type
List[str]
- get_phrases() List[fuzzy_search.fuzzy_phrase.Phrase]
Return a list of all registered phrases.
- Returns
a list of all registered phrases
- Return type
List[Phrase]
- get_phrases_by_max_length(max_length: int, include_variants: bool = False) Generator[fuzzy_search.fuzzy_phrase.Phrase, None, None]
Return all phrase in the phrase model that are no longer than a given length.
- Parameters
max_length (int) – the maximum length of phrases to be returned
include_variants – whether to include variants
- Returns
a generator that yield phrases
- Return type
Generator[Phrase, None, None]
- get_variants(phrases: Optional[List[str]] = None) List[Dict[str, Union[str, List[str]]]]
Return registered variants of a specific list of phrases or of all registered phrases (when no list of phrases is given).
- Parameters
phrases (List[str]) – a list of registered phrase strings
- Returns
a list of dictionaries of phrases and their variants
- Return type
List[Dict[str, Union[str, List[str]]]]
- has_custom(phrase_string: str, custom_property: str) bool
Check if a phrase has a given custom property.
- Parameters
phrase_string (str) – a phrase string of a registered phrase.
custom_property (str) – the name of a custom property of the registered phrase
- Returns
a boolean to indicate whether the phrase has a custom property of the given property name
- Return type
bool
- has_label(phrase_string: str) bool
Check if a registered phrase has a label.
- Parameters
phrase_string (str) – a phrase string of a registered phrase
- Returns
a boolean indicating if the registered phrase has a label
- has_phrase(phrase: Union[str, Dict[str, any], fuzzy_search.fuzzy_phrase.Phrase]) bool
Check if phrase is registered in phrase_model.
- Parameters
phrase (Union[str, Dict[str, any], Phrase]) – a phrase string
- Returns
a boolean indicating whether phrase is registered
- Return type
bool
- index_phrase_words(phrase: fuzzy_search.fuzzy_phrase.Phrase) None
Index a phrase on its individual words, for exact match look up routines.
- Parameters
phrase (Phrase) – a phrase object that is part of the phrase model
- is_label(label: str) bool
Check if label is registered as label of any known phrase.
- Parameters
label (str) – a label string to be checked
- Returns
a boolean whether the label belongs to a known phrase
- Return type
bool
- property json: List[Dict[str, Union[str, List[str]]]]
Return a JSON representation of the phrase model.
- Returns
a JSON respresentation of the phrase model
- Return type
List[Dict[str, Union[str, List[str]]]]
- remove_custom(custom: List[Dict[str, any]]) None
Remove custom properties for a list of phrases.
- Parameters
custom (List[Dict[str, any]]) – a list of phrase dictionaries with custom properties to remove
- remove_distractor(distractor_phrase: fuzzy_search.fuzzy_phrase.Phrase) None
Remove a distractor phrase from the model, including its connection to the phrase it is a distractor of.
- Parameters
distractor_phrase (Phrase) – a phrase that is registered as a distractor of one or more main phrases
- remove_distractors(distractors: Optional[List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]] = None, distractors_of_phrase: Union[None, str] = None)
Remove a list of distractors of a phrase. - distractors: a list of dictionaries with phrases as key and the list of distractors to be removed as values distractors = [{‘phrase’: ‘some phrase’, ‘distractors’: [‘distractor to remove’, ‘some other distractor’]}] - phrase: remove all distractors of a given phrase
- Parameters
distractors (Union[List[Union[str, Phrase]], None]) – an optional list of phrase dictionaries with ‘distractors’ property
distractors_of_phrase (Union[str, None]) – an optional string of a registered phrase for which all distractors are removed
- remove_labels(phrases: Union[List[fuzzy_search.fuzzy_phrase.Phrase], List[str]]) None
Remove labels for known phrases.
- Parameters
phrases (Union[List[Phrase], List[str]]) – is a list of known phrases (either as Phrase objects or strings)
- remove_phrase(phrase: fuzzy_search.fuzzy_phrase.Phrase)
Remove a main phrase from the model, including its connections to any variant and distractor phrases.
- Parameters
phrase (Phrase) – a phrase that is registered as a main phrase
- remove_phrase_words(phrase: fuzzy_search.fuzzy_phrase.Phrase) None
Remove the individual words of a phrase from the index. Only use this is you are removing the phrase from the phrase model.
- Parameters
phrase (Phrase) – a phrase object that is part of the phrase model
- remove_phrases(phrases: List[Union[str, Dict[str, Union[str, List[str]]], fuzzy_search.fuzzy_phrase.Phrase]])
Remove a list of phrases from the phrase model. If it has any registered spelling variants, remove those as well.
- Parameters
phrases (List[Union[str, Dict[str, Union[str, List[str]]]]]) – a list of phrases/keyphrases
- remove_variant(variant_phrase: fuzzy_search.fuzzy_phrase.Phrase) None
Remove a variant phrase from the model, including its connection to the phrase it is a variant of.
- Parameters
variant_phrase (Phrase) – a phrase that is registered as a variant of one or more main phrases
- remove_variants(variants: Optional[List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]] = None, variants_of_phrase: Optional[Union[str, fuzzy_search.fuzzy_phrase.Phrase]] = None)
Remove a list of spelling variants of a phrase.
- validate_entry_phrase(entry: Dict[str, Union[str, int, float, list]]) None
Check if a given phrase (as dictionary) is registered.
- Parameters
entry (Dict[str, Union[str, int, float, list]]) – a phrase dictionary with a ‘phrase’ property
- variant_of(variant: Union[str, fuzzy_search.fuzzy_phrase.Phrase]) Union[None, fuzzy_search.fuzzy_phrase.Phrase]
- variants(phrase: Union[str, fuzzy_search.fuzzy_phrase.Phrase]) Union[None, List[fuzzy_search.fuzzy_phrase.Phrase]]
Return all variants of a given phrase.
- fuzzy_search.fuzzy_phrase_model.as_phrase_object(phrase: Union[str, dict, fuzzy_search.fuzzy_phrase.Phrase], ngram_size: int = 2, skip_size: int = 2) fuzzy_search.fuzzy_phrase.Phrase
- fuzzy_search.fuzzy_phrase_model.is_phrase_dict(phrase_dict: Dict[str, Union[str, List[str]]]) bool
fuzzy_search.fuzzy_phrase_searcher module
- class fuzzy_search.fuzzy_phrase_searcher.FuzzyPhraseSearcher(config: Union[None, Dict[str, Union[str, int, float]]] = None)
Bases:
object- configure(config: Dict[str, any]) None
Configure the fuzzy searcher with a given config object.
- Parameters
config (Dict[str, Union[str, int, float]]) – a config dictionary
- filter_matches_by_distractors(matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]
- filter_matches_by_threshold(matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]
- find_candidates(text: dict, use_word_boundaries: bool, include_variants: Union[None, bool] = None, known_word_offset: Optional[Dict[int, Dict[str, any]]] = None) List[fuzzy_search.fuzzy_match.Candidate]
Find candidate fuzzy matches for a given text.
- Parameters
text (dict) – the text object to match with phrases
use_word_boundaries (bool) – use word boundaries in determining match boundaries
include_variants (bool) – boolean flag for whether to include phrase variants for finding matches
known_word_offset (Dict[int, Dict[str, any]]) – a dictionary of known words and their text offsets based on exact matches
- Returns
a list of candidate matches
- Return type
List[Candidate]
- find_exact_matches(text: Union[str, Dict[str, str]], use_word_boundaries: Union[None, bool] = None, include_variants: Union[None, bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatch]
Find all fuzzy matching phrases for a given text.
- Parameters
text (Union[str, Dict[str, str]]) – the text (string or dictionary with ‘text’ property) to find fuzzy matching phrases in.
use_word_boundaries (Union[None, bool]) – use word boundaries in determining match boundaries
include_variants (Union[None, bool]) – boolean flag for whether to include phrase variants for finding matches
- Returns
a list of phrases matches
- Return type
- find_matches(text: Union[str, Dict[str, str]], use_word_boundaries: Union[None, bool] = None, allow_overlapping_matches: Union[None, bool] = None, include_variants: Union[None, bool] = None, filter_distractors: Union[None, bool] = None, skip_exact_matching: Optional[bool] = None) List[fuzzy_search.fuzzy_match.PhraseMatch]
Find all fuzzy matching phrases for a given text. By default, a first pass of exact matching is conducted to find exact occurrences of phrases. This is to speed up the fuzzy matching pass
- Parameters
text (Union[str, Dict[str, str]]) – the text (string or dictionary with ‘text’ property) to find fuzzy matching phrases in.
use_word_boundaries (Union[None, bool]) – use word boundaries in determining match boundaries
allow_overlapping_matches (Union[None, bool]) – boolean flag for whether to allow matches to overlap in their text ranges
include_variants (Union[None, bool]) – boolean flag for whether to include phrase variants for finding matches
filter_distractors (Union[None, bool]) – boolean flag for whether to remove phrase matches that better match distractors
skip_exact_matching (Union[None, bool]) – boolean flag whether to skip the exact matching step
- Returns
a list of phrases matches
- Return type
- find_skipgram_matches(text: Dict[str, Union[str, int, float, list]], include_variants: Union[None, bool] = None, known_word_offset: Optional[Dict[int, Dict[str, any]]] = None) fuzzy_search.fuzzy_phrase_searcher.SkipMatches
Find all skipgram matches between text and phrases.
- Parameters
text (Dict[str, Union[str, int, float, list]]) – the text object to match with phrases
include_variants (bool) – boolean flag for whether to include phrase variants for finding matches
known_word_offset (Dict[int, Dict[str, any]]) – a dictionary of known words and their text offsets based on exact matches
- Returns
a SkipMatches object contain all skipgram matches
- Return type
- index_distractors(distractors: List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]) None
Add a list of distractor phrases to filter out likely incorrect phrase matches.
- Parameters
distractors (List[Union[str, Phrase]]) – a list of distractors, either as string or as Phrase objects
- index_phrase_model(phrase_model: Union[List[Dict[str, Union[str, int, float, list]]], fuzzy_search.fuzzy_phrase_model.PhraseModel])
Add a phrase model to search for phrases in texts.
- Parameters
phrase_model (Union[List[Dict[str, Union[str, int, float, list]]], PhraseModel]) – a phrase model, either as dictionary or as PhraseModel object
- index_phrases(phrases: List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]) None
Add a list of phrases to search for in texts.
- Parameters
phrases (List[Union[str, Phrase]]) – a list of phrases, either as string or as Phrase objects
- index_variants(variants: List[Union[str, fuzzy_search.fuzzy_phrase.Phrase]]) None
Add a list of variant phrases to search for in texts.
- Parameters
variants (List[Union[str, Phrase]]) – a list of variants, either as string or as Phrase objects
- class fuzzy_search.fuzzy_phrase_searcher.SkipMatches(ngram_size: int, skip_size: int)
Bases:
object- add_skip_match(skipgram: fuzzy_search.fuzzy_string.SkipGram, phrase: fuzzy_search.fuzzy_phrase.Phrase) None
Add a skipgram from a text that matches a phrase.
- fuzzy_search.fuzzy_phrase_searcher.add_exact_match_score(match: fuzzy_search.fuzzy_match.PhraseMatch) fuzzy_search.fuzzy_match.PhraseMatch
- fuzzy_search.fuzzy_phrase_searcher.candidates_to_matches(candidates: List[fuzzy_search.fuzzy_match.Candidate], text: dict, phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, ignorecase: bool = False) List[fuzzy_search.fuzzy_match.PhraseMatch]
- fuzzy_search.fuzzy_phrase_searcher.filter_matches_by_overlap(filtered_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]
- fuzzy_search.fuzzy_phrase_searcher.filter_overlapping_phrase_candidates(phrase_candidates: List[fuzzy_search.fuzzy_match.Candidate]) List[fuzzy_search.fuzzy_match.Candidate]
- fuzzy_search.fuzzy_phrase_searcher.filter_skipgram_threshold(skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches, skip_threshold: float) List[fuzzy_search.fuzzy_phrase.Phrase]
Filter the skipgram matches based on the skipgram overlap threshold.
- Parameters
skip_matches (SkipMatches) – the phrases that matches the text
skip_threshold (float) – the threshold for the skipgram overlap between a text and a phrase
- Returns
the list of phrases with a skipgram overlap that meets the threshold
- Return type
List[Phrase]
- fuzzy_search.fuzzy_phrase_searcher.get_exact_match_ranges(exact_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[dict]
- fuzzy_search.fuzzy_phrase_searcher.get_known_word_offsets(match_ranges: List[Dict[str, any]], text_doc: Dict[str, str]) Dict[int, dict]
- fuzzy_search.fuzzy_phrase_searcher.get_skipmatch_candidates(text: Dict[str, any], skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches, skipgram_threshold: float, phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, max_length_variance: int = 1, ignorecase: bool = False) List[fuzzy_search.fuzzy_match.Candidate]
Find all candidate matches for the phrases in a SkipMatches object.
- Parameters
text (Dict[str, any]) – the text object to match with phrases
skip_matches (SkipMatches) – a SkipMatches object with matches between a text and a list of phrases
skipgram_threshold (float) – a threshold for how many skipgrams should match between a phrase and a candidate
phrase_model (PhraseModel) – a phrase model, either as dictionary or as PhraseModel object
max_length_variance (int) – the maximum difference in length between candidate and phrase
ignorecase (bool) – whether to ignore case when matching skip grams
- Returns
a list of candidate matches
- Return type
List[Candidate]
- fuzzy_search.fuzzy_phrase_searcher.get_skipmatch_phrase_candidates(text: Dict[str, any], phrase: fuzzy_search.fuzzy_phrase.Phrase, skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches, skipgram_threshold: float, max_length_variance: int = 1, ignorecase: bool = False) List[fuzzy_search.fuzzy_match.Candidate]
Find all candidate matches for a given phrase and SkipMatches object.
- Parameters
text (Dict[str, any]) – the text object to match with phrases
phrase (Phrase) – a phrase to find candidate matches for
skip_matches (SkipMatches) – a Skipmatches object with matches between a text and a list of phrases
skipgram_threshold (float) – a threshold for how many skipgrams should match between a phrase and a candidate
max_length_variance (int) – the maximum difference in length between candidate and phrase
ignorecase (bool) – whether to ignore case when matching skip grams
- Returns
a list of candidate matches
- Return type
List[Candidate]
- fuzzy_search.fuzzy_phrase_searcher.get_skipset_overlap(phrase: fuzzy_search.fuzzy_phrase.Phrase, skip_matches: fuzzy_search.fuzzy_phrase_searcher.SkipMatches) float
Calculate the overlap between the set of skipgrams of a text and the skipgrams of a phrase.
- Parameters
phrase (Phrase) – a phrase object that has been matched against a text
skip_matches (SkipMatches) – a SkipMatches object containing the skipgram matches between a text and a list of phrases
- Returns
the fraction of skipgrams in the phrase that overlaps with the text
- Return type
float
- fuzzy_search.fuzzy_phrase_searcher.get_text_dict(text: Union[str, dict], ignorecase: bool = False) dict
Check that text is in a dictionary with an id property, so that passing a long text goes by reference instead of copying the long text string.
- Parameters
text (Union[str, dict]) – a text string or text dictionary
ignorecase (bool) – boolean flag for whether to ignore case
- Returns
a text dictionary with an id property
- Return type
dict
- fuzzy_search.fuzzy_phrase_searcher.index_known_word_offsets(exact_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) Dict[int, Dict[str, any]]
- fuzzy_search.fuzzy_phrase_searcher.search_exact(phrase: fuzzy_search.fuzzy_phrase.Phrase, text: Dict[str, str], ignorecase: bool = False, use_word_boundaries: bool = True)
- fuzzy_search.fuzzy_phrase_searcher.search_exact_phrases(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, text: Dict[str, str], ignorecase: bool = False, use_word_boundaries: bool = True, include_variants: bool = False)
- fuzzy_search.fuzzy_phrase_searcher.search_exact_phrases_with_word_boundaries(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, text: Dict[str, str], ignorecase: bool = False, include_variants: bool = False)
- fuzzy_search.fuzzy_phrase_searcher.search_exact_phrases_without_word_boundaries(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, text: Dict[str, str], ignorecase: bool = False, include_variants: bool = False)
fuzzy_search.fuzzy_searcher module
- class fuzzy_search.fuzzy_searcher.FuzzySearcher(char_match_threshold=0.5, ngram_threshold=0.5, levenshtein_threshold=0.5, max_length_variance=1)
Bases:
object- disable_strip_suffix()
- enable_strip_suffix()
- filter_candidates(candidates, keyword, ngram_size=2)
- filter_char_match_candidates(candidates, match_term)
- filter_levenshtein_candidates(candidates, match_term)
- filter_ngram_candidates(candidates, match_term, ngram_size)
- find_candidates(text, keyword, ngram_size=2, use_word_boundaries=False)
Find candidate matches that start with the same initial character as the search term and filter them based on default thresholds for character overlap, ngram overlap and levenshtein distance.
- find_start_candidates(text, term, use_word_boundaries)
Find candidate matches that start with the same initial character as the search term.
- find_term_matches(text, term, max_length_variance=None, use_word_boundaries=False)
- make_ngrams(term, n)
- rank_candidates(candidates, keyword, ngram_size=2)
- score_char_overlap(term1, term2)
- score_char_overlap_ratio(term1, term2)
- score_levenshtein_distance(s1, s2)
- score_levenshtein_distance_ratio(term1, term2)
- score_ngram_overlap(term1, term2, ngram_size)
- score_ngram_overlap_ratio(term1, term2, ngram_size)
- strip_suffix(match)
- fuzzy_search.fuzzy_searcher.create_term_match(re_match, term)
fuzzy_search.fuzzy_string module
- class fuzzy_search.fuzzy_string.SkipGram(skipgram_string: str, offset: int, skipgram_length: int)
Bases:
object
- fuzzy_search.fuzzy_string.get_non_word_prefix(string: str) str
Check if a string has a non-word prefix and return it.
- Parameters
string (str) – the string from which the prefix is to be return
- Returns
the non-word prefix
- Return type
str
- fuzzy_search.fuzzy_string.get_non_word_suffix(string: str) str
Check if a string has a non-word suffix and return it.
- Parameters
string (str) – the string from which the suffix is to be return
- Returns
the non-word suffix
- Return type
str
- fuzzy_search.fuzzy_string.insert_skips(window: str, skipgram_combinations: List[List[int]])
For a given skip gram window, return all skip grams for a given configuration.
- fuzzy_search.fuzzy_string.make_ngrams(text: str, n: int) List[str]
Turn a term string into a list of ngrams of size n
- Parameters
text (str) – a text string
n (int) – the ngram size
- Returns
a list of ngrams
- Return type
List[str]
- fuzzy_search.fuzzy_string.score_char_overlap(term1: str, term2: str) int
Count the number of overlapping character tokens in two strings.
- Parameters
term1 (str) – a term string
term2 (str) – a term string
- Returns
the number of overlapping ngrams
- Return type
int
- fuzzy_search.fuzzy_string.score_char_overlap_ratio(term1, term2)
Score the number of overlapping characters between two terms as proportion of the length of the first term
- Parameters
term1 (str) – a term string
term2 (str) – a term string
- Returns
the number of overlapping ngrams
- Return type
int
- fuzzy_search.fuzzy_string.score_levenshtein_distance(term1: str, term2: str) int
Calculate Levenshtein distance between two string.
- Parameters
term1 (str) – a term string
term2 (str) – a term string
- Returns
the number of overlapping ngrams
- Return type
int
- fuzzy_search.fuzzy_string.score_levenshtein_similarity_ratio(term1, term2)
Score the levenshtein similarity between two terms
- Parameters
term1 (str) – a term string
term2 (str) – a term string
- Returns
the number of overlapping ngrams
- Return type
int
- fuzzy_search.fuzzy_string.score_ngram_overlap(term1: str, term2: str, ngram_size: int)
Score the number of overlapping ngrams between two terms
- Parameters
term1 (str) – a first term string
term2 (str) – a second term string
ngram_size (int) – the character ngram size
- Returns
the number of overlapping ngrams
- Return type
int
- fuzzy_search.fuzzy_string.score_ngram_overlap_ratio(term1, term2, ngram_size)
Score the number of overlapping ngrams between two terms as proportion of the length of the first term
- Parameters
term1 (str) – a term string
term2 (str) – a term string
ngram_size (int) – the character ngram size
- Returns
the number of overlapping ngrams
- Return type
int
- fuzzy_search.fuzzy_string.strip_prefix(string: str) str
Strip non-word prefix from string ending.
- Parameters
string (str) – the string from which the prefix is to be stripped
- Returns
the stripped string
- Return type
str
- fuzzy_search.fuzzy_string.strip_suffix(string: str) str
Strip non-word suffix from string ending.
- Parameters
string (str) – the string from which the suffix is to be stripped
- Returns
the stripped string
- Return type
str
- fuzzy_search.fuzzy_string.text2skipgrams(text: str, ngram_size: int = 2, skip_size: int = 2) Generator[fuzzy_search.fuzzy_string.SkipGram, None, None]
Turn a text string into a list of skipgrams.
- Parameters
text (str) – an text string
ngram_size (int) – an integer indicating the number of characters in the ngram
skip_size (int) – an integer indicating how many skip characters in the ngrams
- Returns
An iterator returning tuples of skip_gram and offset
- Return type
Generator[tuple]
fuzzy_search.fuzzy_template module
- class fuzzy_search.fuzzy_template.FuzzyTemplate(phrase_model: fuzzy_search.fuzzy_phrase_model.PhraseModel, template_json: Union[List[str], List[dict], Dict[str, Union[str, dict]]], ignore_unknown: bool = False, ordered: bool = False)
Bases:
object- get_element(element_label: str) Union[None, fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement, fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement]
Return the element corresponding to a given label.
- Parameters
element_label (str) – a fuzzy element label
- Returns
the element corresponding to the label or None if label is unknown
- Return type
Union[FuzzyTemplateElement]
- get_elements_by_cardinality(cardinality: str = 'single') List[fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement]
Return all template elements with a given cardinality.
- Parameters
cardinality (str) – a cardinality type (‘single’ or ‘multi’)
- Returns
the list of labels of elements with a given cardinality
- Return type
List[str]
- get_label_phrases(label: str) List[fuzzy_search.fuzzy_phrase.Phrase]
Return a list of phrases that have a given label.
- Parameters
label (str) – a phrase label for phrases in the registered phrase_model
- Returns
a list of phrases from the registered phrase model that have a given phrase
- Return type
List[Phrase]
- get_labels_by_cardinality(cardinality: str = 'single') List[str]
Return the labels of all template elements with a given cardinality.
- Parameters
cardinality (str) – a cardinality type (‘single’ or ‘multi’)
- Returns
the list of labels of elements with a given cardinality
- Return type
List[str]
- get_required_elements() List[fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement]
Return all required elements in the template.
- Returns
the list of labels of required elements
- Return type
List[FuzzyTemplateElement]
- get_required_labels() List[str]
Return the labels of all required elements in the template.
- Returns
the list of labels of required elements
- Return type
List[str]
- has_group(group: str) bool
Check if the template has group elements with a given group name.
- Parameters
group (str) – a fuzzy element group
- Returns
whether the group corresponds to any registered element(s)
- Return type
bool
- has_label(label: Union[str, List[str]]) bool
Check if the template has label elements with a given label or list of label (any or all).
- Parameters
label (Union[str, List[str]]) – a fuzzy element label
- Returns
whether the label corresponds to any registered element(s)
- Return type
bool
- parse_group_element(group_info: Dict[str, any]) fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement
Parse a group element dictionary/JSON object into a fuzzy template group element.
- Parameters
group_info (dict) – a dictionary containing the properties of the template group element
- Returns
a fuzzy template group element
- Return type
- parse_label_element(label_info: Union[str, Dict[str, any]]) Optional[fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement]
Parse a label element dictionary/JSON object into a fuzzy template label element.
- Parameters
label_info (dict) – a dictionary containing the properties of the template label element
- Returns
a fuzzy template label element, or None if the label is not used in the phrase model
- Return type
- register_template(template_json: Union[List[str], List[dict], Dict[str, Union[str, dict]]]) None
Register a list of elements as a fuzzy template. Each element contains a label that corresponds to at least one phrase in the registered phrase model.
- Parameters
template_json (Union[List[str], List[dict], Dict[str, Union[str, dict]]]) – a dictionary of template groups or elements to be registered as part of the template
- class fuzzy_search.fuzzy_template.FuzzyTemplateElement(label: Union[None, str, List[str]], element_type: str, required: bool)
Bases:
object
- class fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement(elements: List[fuzzy_search.fuzzy_template.FuzzyTemplateElement], label: Union[None, str] = None, ordered: bool = True, required: bool = False)
- class fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement(label: str, required: bool = False, cardinality: str = 'single', next_label: Union[None, str, List[str]] = None, next_distance_max: Union[None, int] = None, variable: bool = False)
- fuzzy_search.fuzzy_template.generate_group_from_json(element_info: dict, group_elements: List[fuzzy_search.fuzzy_template.FuzzyTemplateElement]) fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement
Generate a FuzzyTemplateGroupElement from a element json dictionary and a list of group elements.
- Parameters
element_info (dict) – a dictionary containing the properties of the template group element
group_elements (List[FuzzyTemplateElement]) – a list of fuzzy template elements that are part of the group element
- Returns
a fuzzy template group element
- Return type
- fuzzy_search.fuzzy_template.generate_label_from_json(label: str, element_info: dict) fuzzy_search.fuzzy_template.FuzzyTemplateLabelElement
Generate a FuzzyTemplateLabelElement from a label and an element json dictionary.
- Parameters
label (str) – the label string for the label element
element_info (dict) – a dictionary containing the properties of the template label element
- Returns
a fuzzy template label element
- Return type
- fuzzy_search.fuzzy_template.validate_element_properties(label: str, required: bool = False, cardinality: str = 'multi', next_label: Union[None, str, List[str]] = None, next_distance_max: Union[None, int] = None, variable: bool = False) None
Validate the properties of a FuzzyTemplate element.
- Parameters
label (Union[str, List[str]]) – the label of the element, which can be a single string or a list of strings
required (bool) – whether or not the element must match for the template to match
cardinality (str) – whether the element can occur only once (default) or multiple times in a template match.
next_label (Union[str, List[str]]) – what the label of the next element should be. Use a list of labels for multiple options.
next_distance_max (int) – the maximum distance allowed between this element and the next element in the template
variable (bool) – flag to indicate the element has no phrases but has variable text (default is False)
fuzzy_search.fuzzy_template_searcher module
- class fuzzy_search.fuzzy_template_searcher.FuzzyTemplateSearcher(template: Union[None, fuzzy_search.fuzzy_template.FuzzyTemplate] = None, config: Optional[dict] = None)
Bases:
fuzzy_search.fuzzy_context_searcher.FuzzyContextSearcher- filter_phrase_matches(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_match.PhraseMatch]
Filter a list of phrase matches to only include phrase matches that have at least one label in common with the template.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches
- Returns
a filtered list of phrases matches
- Return type
List[PhraseMatch]
- find_template_matches(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[fuzzy_search.fuzzy_template_searcher.TemplateMatch]
Find all the matches that fit a template. The method returns a list of template matches, where each template match contains the phrase match that fit the template. There can be multiple template matches, if the phrase matches fit a template multiple times.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches
- Returns
a list of template matches
- Return type
List[TemplateMatch]
- search_text(text: Union[str, Dict[str, str]]) List[fuzzy_search.fuzzy_template_searcher.TemplateMatch]
Search phrases from the registered template’s phrase model in the text and check if the resulting matches together match the template. This method returns a dictionary including the individual phrase matches and any template matches.
- Parameters
text (Union[str, Dict[str, str]]) – a text to search in, either as a string or a dictionary with text and an identifier
- Returns
a dictionary with all phrase matches and template matches
- Return type
Dict[str, Union[List[PhraseMatch], List[TemplateMatch]]]
- set_template(template: fuzzy_search.fuzzy_template.FuzzyTemplate) None
Set a new template for the searcher and index the corresponding phrase model.
- Parameters
template (FuzzyTemplate) – a fuzzy template to use for searching
- class fuzzy_search.fuzzy_template_searcher.TemplateMatch(template: fuzzy_search.fuzzy_template.FuzzyTemplate, phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_sequence: Dict[str, any])
Bases:
object
- fuzzy_search.fuzzy_template_searcher.find_next_element_end_index(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_element: fuzzy_search.fuzzy_template.FuzzyTemplateElement, element_start_index: int) int
Find the next phrase match that doesn’t match a template element, from a given starting point in a list of phrase matches.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element
template_element (FuzzyTemplateElement) – a template element to test the phrase matches against
element_start_index (int) – the index in the phrase list where the template elements first matches the template
- Returns
the index in the phrase list where the template element stops matching
- Return type
int
- fuzzy_search.fuzzy_template_searcher.find_next_element_start_index(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_element: fuzzy_search.fuzzy_template.FuzzyTemplateElement, template_start_index: int) int
Find the next phrase match that matches a template element, from a given starting point in a list of phrase matches.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element
template_element (FuzzyTemplateElement) – a template element to test the phrase matches against
template_start_index (int) – the index in the phrase list to start the matching process
- Returns
the index in the phrase list where the template element matches
- Return type
int
- fuzzy_search.fuzzy_template_searcher.find_next_group_match_sequence(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_group: fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement, template_start_index: int) Union[None, Dict[str, any]]
Find the next sequence of phrase matches that match a template group element, from a given starting point in the list of phrase matches. This function returns None if the template doesn’t match.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element
template_group (FuzzyTemplateGroupElement) – a template group element to test the phrase matches against
template_start_index (int) – the index in the phrase list to start the matching process
- Returns
a sequence with start and end indexes in the list of phrase matches that match the template group
- Return type
Union[None, Dict[str, any]]
- fuzzy_search.fuzzy_template_searcher.find_next_ordered_group_match_sequence(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_group: fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement, template_start_index: int) Union[None, Dict[str, any]]
Find the next sequence of phrase matches that match an ordered template group element, from a given starting point in the list of phrase matches. This function returns None if the template doesn’t match.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element
template_group (FuzzyTemplateGroupElement) – a template group element to test the phrase matches against
template_start_index (int) – the index in the phrase list to start the matching process
- Returns
a sequence with start and end indexes in the list of phrase matches that match the template group
- Return type
Union[None, Dict[str, any]]
- fuzzy_search.fuzzy_template_searcher.find_next_unordered_group_match_sequence(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template_group: fuzzy_search.fuzzy_template.FuzzyTemplateGroupElement, template_start_index: int) Union[None, Dict[str, any]]
Find the next sequence of phrase matches that match an unordered template group element, from a given starting point in the list of phrase matches. This function returns None if the template doesn’t match.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches to be tested against a template element
template_group (FuzzyTemplateGroupElement) – a template group element to test the phrase matches against
template_start_index (int) – the index in the phrase list to start the matching process
- Returns
a sequence with start and end indexes in the list of phrase matches that match the template group
- Return type
Union[None, Dict[str, any]]
- fuzzy_search.fuzzy_template_searcher.get_phrase_match_list_labels(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch]) List[str]
Return a list of all the labels of a list of phrase matches.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches
- Returns
a list of phrase match labels
- Return type
List[str]
- fuzzy_search.fuzzy_template_searcher.get_sequence_label_element_matches(template_sequence: Dict[str, any]) List[Dict[str, any]]
- fuzzy_search.fuzzy_template_searcher.has_required_matches(phrase_matches: List[fuzzy_search.fuzzy_match.PhraseMatch], template: fuzzy_search.fuzzy_template.FuzzyTemplate) bool
Check if list of phrase matches contain all required labels of a template.
- Parameters
phrase_matches (List[PhraseMatch]) – a list of phrase matches
template (FuzzyTemplate) – a fuzzy template to use for searching
- Returns
a True value only if all required labels have at least one match
- fuzzy_search.fuzzy_template_searcher.initialize_sequence(element: fuzzy_search.fuzzy_template.FuzzyTemplateElement, start_index: int, end_index: int) Dict[str, any]
Check if two fuzzy objects (phrase matches of template elements) share at least one label.
- Parameters
object1 (Union[PhraseMatch, FuzzyTemplateElement]) – the first object to compare
object2 (Union[PhraseMatch, FuzzyTemplateElement]) – the second object to compare
- Returns
boolean value indicating that the two objects share a label
- Return type
bool