mapper.matchers.fuzzy_matcher
Fuzzy matcher for matching Excel and geographical data.
This module provides a matcher that uses fuzzy string matching to find similar values between Excel and geographical data columns. It uses the RapidFuzz library to find the best match for each Excel value in the geographical data, subject to a minimum similarity threshold.
FuzzyMatcher
Bases: BaseMatcher, Ui_FuzzyMatcher
Matcher for fuzzy string matching between Excel and geographical data.
Behavior
- Uses RapidFuzz to extract the best match for each Excel value from geo values.
- Only accepts matches above a configurable similarity threshold.
- Ensures one-to-one matching (no duplicate matches on geo side).
Signals
updated: Emitted when the matcher configuration changes. removed: Emitted when the matcher is removed.
Source code in src/mapper/matchers/fuzzy_matcher.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
__init__(nr, excel_cols, geo_cols)
Initialize the fuzzy matcher UI and internal state.
Steps
- Call BaseMatcher constructor to store identifier and column lists.
- Call setupUi(self) to create UI controls from the .ui file.
- Populate the Excel and geo combo boxes with available columns.
- Connect UI signals to notify when configuration changes or removal is requested.
Source code in src/mapper/matchers/fuzzy_matcher.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
description()
Return a description of this matcher’s configuration.
Steps
- Combine the matcher ID, selected Excel column, and selected geo column.
- Format as "FUZZ#
: → ".
Source code in src/mapper/matchers/fuzzy_matcher.py
187 188 189 190 191 192 193 194 195 | |
match(excel_df, geo_df)
Match records between Excel and geographical data using fuzzy string matching.
Steps
- Retrieve selected columns and threshold from UI.
- If threshold ≤ 1, scale it to percentage (multiply by 100).
- If either combo is empty, reset stats display and return no matches.
- Convert both column values to strings for comparison.
- Use RapidFuzz
extractOneto find the best geo match for each Excel value, scoring with WRatio and discarding choices below threshold. - Skip matches where the geo value was already used to enforce 1:1 mapping.
- For each accepted match, build a result row with a label containing the score.
- Concatenate all matched parts into one DataFrame.
- Update the stats label with the number of matched rows.
Source code in src/mapper/matchers/fuzzy_matcher.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
set_stats(n)
Update the statistics label to reflect the number of matched rows.
Steps
- Set the text of
labelStatsto show "Mappings: n".
Source code in src/mapper/matchers/fuzzy_matcher.py
175 176 177 178 179 180 181 182 | |