Checks if a search conducted for search_term
should match potential_hit
.
This function calls tokenize_and_fold on both search_term
and
potential_hit
. ASCII alternates are never taken for search_term
but will be taken for potential_hit
according to the value of accept_alternates
.
A hit occurs when each folded token in search_term
is a prefix of a folded token from potential_hit
.
Depending on how you're performing the search, it will typically be faster to call tokenize_and_fold on each string in your corpus and build an index on the returned folded tokens, then call tokenize_and_fold on the search term and perform lookups into that index.
As some examples, searching for ‘fred’ would match the potential hit ‘Smith, Fred’ and also ‘Frédéric’. Searching for ‘Fréd’ would match ‘Frédéric’ but not ‘Frederic’ (due to the one-directional nature of accent matching). Searching ‘fo’ would match ‘Foo’ and ‘Bar Foo Baz’, but not ‘SFO’ (because no word has ‘fo’ as a prefix).
search_term |
the search term from the user |
accept_alternates |
true to accept ASCII alternates |
potential_hit |
the text that may be a hit |
true if |