feat: matching program subtype discrimination (1:1/1:N/M:N/MxN)

Add _resolve_matching_subtype post-processing step in classify_program()
that distinguishes matching program subtypes based on key variable naming
patterns and file/structural features:

Rules (in priority order):
1. 二段階 → 二段階 (already handled by rule engine)
2. 3 files + WS-SAVE-KEY → M:N→MxN (MT20)
3. WS-PREV-KEY present → 混合 (already handled, MT32)
4. WS-MAST-KEY + WS-TRAN-KEY → 1:N (MT02)
5. >=3 KEY vars + >=2 files → M:N (MT33)
6. Otherwise → 1:1 (MT01, MT03, MT18, MT19)

Results:
  MT01→1:1, MT02→1:N, MT03→1:1, MT16/17→二段階,
  MT18/19→1:1, MT20→M:N→MxN, MT33→M:N

Also fix double-backslash regex bug in classifier.py and pipeline.py
(r'[-\w]' should be r'[\w-]' for word character class).
Regression: 745 passed (unchanged).
This commit is contained in:
NB-076
2026-06-21 13:33:25 +08:00
parent 65e9919933
commit 7d5c82e0e2
2 changed files with 80 additions and 9 deletions
+1 -1
View File
@@ -23,7 +23,7 @@ L1_RULES: list[tuple[str, list[str], float]] = [
("编辑输出", ["WRITE AFTER", "WRITE BEFORE"], 0.80),
("文件编成", ["ORGANIZATION IS"], 0.99),
("替代索引", ["ALTERNATE RECORD KEY"], 0.99),
("マッチング", ["re:WS-[-\\w]*KEY"], 0.65),
("マッチング", ["re:WS-[\\w-]*KEY"], 0.65),
]
# ── 冲突解决规则 ─────────────────────────────────────────────────────────