You've already forked fuzzysearch
mirror of
https://github.com/Dasharo/fuzzysearch.git
synced 2026-03-06 15:27:05 -08:00
82 lines
3.4 KiB
ReStructuredText
82 lines
3.4 KiB
ReStructuredText
========
|
|
Usage
|
|
========
|
|
|
|
Simple Example
|
|
--------------
|
|
You can usually just use the `find_near_matches()` utility function, which
|
|
chooses a suitable fuzzy search implementation according to the given
|
|
parameters:
|
|
|
|
.. code:: python
|
|
|
|
>>> from fuzzysearch.generic_search import find_near_matches_generic_ngrams
|
|
>>> find_near_matches('PATTERN', 'aaaPATERNaaa', max_l_dist=1)
|
|
[Match(start=3, end=9, dist=1, matched='PATERN')]
|
|
|
|
Internal Functions
|
|
------------------
|
|
If needed you can choose a specific internal search implementation. These are
|
|
not as easy to use, since they do not use the same interface as exposed by
|
|
``find_near_matches()``. There is a complete working example below.
|
|
|
|
Many of these functions require passing the fuzzy matching parameters as a
|
|
``LevenshteinSearchParams`` object, which can be created after importing the
|
|
class from ``fuzzysearch.common``.
|
|
|
|
Many of these functions also often return many overlapping search results.
|
|
These may be consolidated using the ``consolidate_overlapping_matches()``
|
|
function, also found in the ``fuzzysearch.common`` module.
|
|
|
|
Finally, some of these functions are generator functions rather than returning
|
|
a list. Wrap the call with ``list()`` if needed.
|
|
|
|
The available internal functions, by module:
|
|
|
|
* ``fuzzysearch.search_exact``
|
|
* ``search_exact(subsequence, sequence, start_index=0, end_index=None)``
|
|
* ``fuzzysearch.generic_search``: Supports specifying any combination of fuzzy matching limitations.
|
|
* ``find_near_matches_generic``
|
|
* ``find_near_matches_generic_linear_programming``
|
|
* ``find_near_matches_generic_ngrams``
|
|
* ``has_near_match_generic_ngrams``
|
|
* ``fuzzysearch.levenshtein``: Supports only specifying the max. distance.
|
|
* ``find_near_matches_levenshtein``
|
|
* ``find_near_matches_levenshtein_linear_programming``
|
|
* ``find_near_matches_levenshtein_ngrams``
|
|
* ``fuzzysearch.substitutions_only``: Allow only substitutions (fast!).
|
|
* ``find_near_matches_substitutions()``
|
|
* ``has_near_match_substitutions()``
|
|
* ``find_near_matches_substitutions_lp()``
|
|
* ``find_near_matches_substitutions_ngrams()``
|
|
* ``has_near_match_substitutions_ngrams()``
|
|
* ``fuzzysearch.no_deletions``: Slightly faster when deletions are not allowed.
|
|
* ``find_near_matches_no_deletions_ngrams()``
|
|
|
|
Internal Function Usage Example
|
|
+++++++++++++++++++++++++++++++
|
|
|
|
An example of using ``find_near_matches_generic_ngrams()``:
|
|
|
|
.. code:: python
|
|
|
|
>>> sequence = '''\
|
|
GACTAGCACTGTAGGGATAACAATTTCACACAGGTGGACAATTACATTGAAAATCACAGATTGGTCACACACACA
|
|
TTGGACATACATAGAAACACACACACATACATTAGATACGAACATAGAAACACACATTAGACGCGTACATAGACA
|
|
CAAACACATTGACAGGCAGTTCAGATGATGACGCCCGACTGATACTCGCGTAGTCGTGGGAGGCAAGGCACACAG
|
|
GGGATAGG'''
|
|
>>> subsequence = 'TGCACTGTAGGGATAACAAT' #distance 1
|
|
>>> max_distance = 2
|
|
|
|
>>> from fuzzysearch.generic_search import find_near_matches_generic_ngrams
|
|
>>> from fuzzysearch import LevenshteinSearchParams
|
|
>>> params = LevenshteinSearchParams(max_l_dist=max_distance)
|
|
# note: this will return many overlapping results
|
|
>>> results = find_near_matches_generic_ngrams(subsequence, sequence, params)
|
|
>>> len(results)
|
|
16
|
|
# consolidate the overlapping results, keeping a "good" one from each group
|
|
>>> from fuzzysearch.common import consolidate_overlapping_matches
|
|
>>> consolidate_overlapping_matches(results)
|
|
>>> [Match(start=3, end=24, dist=1, matched='TAGCACTGTAGGGATAACAAT')]
|