mapper.matchers.regex_matcher
Regex matcher for Excel-to-Geo assignments.
This matcher uses two regular expressions—one for the Excel column and one for the geo column—to extract a common token from each cell. Only rows where exactly one token match occurs are paired.
RegexMatcher
Bases: BaseMatcher, Ui_RegexMatcher
Matcher that applies regex on Excel and geo columns and matches rows when both regex patterns yield exactly one identical extracted token.
Behavior
- Retrieves regex patterns from UI fields.
- Extracts a token from each cell via
re.search. - Only matches rows where the extracted token from both sides is identical and occurs exactly once in the geo data.
- Ensures one-to-one mapping (skips duplicate geo matches).
Source code in src/mapper/matchers/regex_matcher.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | |
__init__(nr, excel_cols, geo_cols)
Initialize the regex matcher widget.
Steps
- Call BaseMatcher constructor to store identifier and column lists.
- Call setupUi(self) to build UI controls from .ui file.
- Populate the Excel and geo combo boxes with available columns.
- Connect UI changes (combo and text edits) to the
updatedsignal. - Connect remove button to emit the
removedsignal.
Source code in src/mapper/matchers/regex_matcher.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | |
description()
Provide a description of this matcher’s configuration.
Steps
- Retrieve current selections for Excel and geo columns.
- Format as "REGEX#
: ↦ ".
Source code in src/mapper/matchers/regex_matcher.py
198 199 200 201 202 203 204 205 206 207 208 | |
match(excel_df, geo_df)
Match records when regex extractions from both columns yield exactly one identical token.
Steps
- Retrieve selected columns and regex patterns from UI.
- If any required input is missing, set stats to 0 and return no matches.
- Attempt to compile both regex patterns; on error, set stats to 0 and return.
- Define helper
extract(s, rex)to return the first capturing group or whole match. - Map every cell in the selected Excel column to its extracted token.
- Map every cell in the selected geo column to its extracted token.
- Iterate over each Excel token; skip if None or if resulting matches in geo side are not exactly one.
- Ensure no geo token is used more than once to enforce one-to-one mapping.
- Build a combined result row for each valid match via
build_result(). - Concatenate all parts if any, update stats label, and return used row indices.
Source code in src/mapper/matchers/regex_matcher.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
set_stats(n)
Update the label showing number of matches found.
Steps
- Set
labelStatstext to "Mappings: n".
Source code in src/mapper/matchers/regex_matcher.py
186 187 188 189 190 191 192 193 | |