fuzzy_search package
Submodules
fuzzy_search.fuzzy_config module
fuzzy_search.fuzzy_context_searcher module
fuzzy_search.fuzzy_match module
fuzzy_search.fuzzy_patterns module
fuzzy_search.fuzzy_phrase module
fuzzy_search.fuzzy_phrase_model module
fuzzy_search.fuzzy_phrase_searcher module
fuzzy_search.fuzzy_searcher module
- class fuzzy_search.fuzzy_searcher.FuzzySearcher(char_match_threshold=0.5, ngram_threshold=0.5, levenshtein_threshold=0.5, max_length_variance=1)
Bases:
object- disable_strip_suffix()
- enable_strip_suffix()
- filter_candidates(candidates, keyword, ngram_size=2)
- filter_char_match_candidates(candidates, match_term)
- filter_levenshtein_candidates(candidates, match_term)
- filter_ngram_candidates(candidates, match_term, ngram_size)
- find_candidates(text, keyword, ngram_size=2, use_word_boundaries=False)
Find candidate matches that start with the same initial character as the search term and filter them based on default thresholds for character overlap, ngram overlap and levenshtein distance.
- find_start_candidates(text, term, use_word_boundaries)
Find candidate matches that start with the same initial character as the search term.
- find_term_matches(text, term, max_length_variance=None, use_word_boundaries=False)
- make_ngrams(term, n)
- rank_candidates(candidates, keyword, ngram_size=2)
- score_char_overlap(term1, term2)
- score_char_overlap_ratio(term1, term2)
- score_levenshtein_distance(s1, s2)
- score_levenshtein_distance_ratio(term1, term2)
- score_ngram_overlap(term1, term2, ngram_size)
- score_ngram_overlap_ratio(term1, term2, ngram_size)
- strip_suffix(match)
- fuzzy_search.fuzzy_searcher.create_term_match(re_match, term)
fuzzy_search.fuzzy_string module
fuzzy_search.fuzzy_template module
fuzzy_search.fuzzy_template_searcher module
fuzzy_search.similarity module
- class fuzzy_search.similarity.SkipCooccurrence(vocabulary: Vocabulary, skip_size: int = 1, sentences: Optional[Iterable[List[str]]] = None)
Bases:
object- calculate_skip_cooccurrences(sentences: Iterable[List[str]], skip_size: int = 0)
Count the frequency of term (skip) co-occurrences for a given list of sentences.
- Parameters:
sentences (Iterable[List[str]) – a list of sentences, where each sentence is itself a list of term tokens
skip_size (int) – the maximum number of skips to allow between co-occurring terms
- get_term_coocs(term: str) Union[None, Generator[Tuple[str, str], None, None]]
- class fuzzy_search.similarity.SkipgramSimilarity(ngram_length: int = 3, skip_length: int = 0, terms: Optional[List[str]] = None, max_length_diff: int = 2)
Bases:
object- index_terms(terms: List[str], reset_index: bool = True)
Make a frequency index of the skip grams for a given list of terms. By default, indexing is cumulative, that is, everytime you call index_terms with a list of terms, they are added to the index. Use ‘reset_index=True’ to reset the index before indexing the given terms.
- Parameters:
terms (List[str]) – a list of term to index
reset_index (bool) – whether to reset the index before indexing or to keep the existing index
- rank_similar(term: str, top_n: int = 10, score_cutoff: float = 0.5)
Return a ranked list of similar terms from the index for a given input term, based on their character skipgram cosine similarity.
- Parameters:
term (str) – a term (any string) to match against the indexed terms
top_n (int (default 10)) – the number of highest ranked terms to return
score_cutoff (float) – the minimum similarity score after which to cutoff the ranking
- Returns:
a ranked list of terms and their similarity scores
- Return type:
List[Tuple[str, float]]
- class fuzzy_search.similarity.Vocabulary
Bases:
object- add_terms(terms: List[str], reset_index: bool = True)
Add a list of terms to the vocabulary. Use ‘reset_index=True’ to reset the vocabulary before adding the terms.
- Parameters:
terms (List[str]) – a list of terms to add to the vocabulary
reset_index (bool) – a flag to indicate whether to empty the vocabulary before adding terms
- id2term(term_id: int)
Return the term for a given term ID.
- reset_index()
- term2id(term: str)
Return the term ID for a given term.
- fuzzy_search.similarity.get_begin_sim(phrase1: str, phrase2: str, begin_length: int) float
- fuzzy_search.similarity.get_end_sim(phrase1: str, phrase2: str, end_length: int) float
- fuzzy_search.similarity.get_min_length(phrase1: str, phrase2: str, begin_length: int) int
- fuzzy_search.similarity.get_skip_coocs(seq_ids: List[str], skip_size: int = 0) Generator[Tuple[int, int], None, None]
- fuzzy_search.similarity.vector_length(skipgram_freq)
Module contents
- fuzzy_search.make_searcher(phrases: any, config)