feat: matching program subtype discrimination (1:1/1:N/M:N/MxN)

Add _resolve_matching_subtype post-processing step in classify_program() that distinguishes matching program subtypes based on key variable naming patterns and file/structural features: Rules (in priority order): 1. 二段階 → 二段階 (already handled by rule engine) 2. 3 files + WS-SAVE-KEY → M:N→MxN (MT20) 3. WS-PREV-KEY present → 混合 (already handled, MT32) 4. WS-MAST-KEY + WS-TRAN-KEY → 1:N (MT02) 5. >=3 KEY vars + >=2 files → M:N (MT33) 6. Otherwise → 1:1 (MT01, MT03, MT18, MT19) Results: MT01→1:1, MT02→1:N, MT03→1:1, MT16/17→二段階, MT18/19→1:1, MT20→M:N→MxN, MT33→M:N Also fix double-backslash regex bug in classifier.py and pipeline.py (r'[-\w]' should be r'[\w-]' for word character class). Regression: 745 passed (unchanged).
2026-06-21 13:33:25 +08:00
parent 65e9919933
commit 7d5c82e0e2
2 changed files with 80 additions and 9 deletions
@@ -23,7 +23,7 @@ L1_RULES: list[tuple[str, list[str], float]] = [
    ("编辑输出", ["WRITE AFTER", "WRITE BEFORE"], 0.80),
    ("文件编成", ["ORGANIZATION IS"], 0.99),
    ("替代索引", ["ALTERNATE RECORD KEY"], 0.99),
-    ("マッチング", ["re:WS-[-\\w]*KEY"], 0.65),
+    ("マッチング", ["re:WS-[\\w-]*KEY"], 0.65),
 ]

 # ── 冲突解决规则 ─────────────────────────────────────────────────────────