cobol-java-v3

Author	SHA1	Message	Date
NB-076	cdba324b5a	fix: HINA 全类型缺陷修复 — SORT/CSV/ALT 3 个真实缺陷对抗性全类型测试发现的缺陷和修复: 缺陷1: SORT/MERGE L1 关键词太严格（漏检） - 旧: 'SORT ON KEY' / 'MERGE ON KEY'（精确字符串） - COBOL 中的真实写法: SORT WORK-FILE ON ASCENDING KEY ... - 新: 正则 SORT(?:\s+\S+)?\s+ON\s+(?:ASCENDING\|DESCENDING)?KEY 缺陷2: CSV 假阳性（STRING/INSPECT 非CSV也触发） - 旧: has_string=True -> CSV合并 - 新: 要求 has_csv_merge（STRING+逗号分隔） - 单纯字符串拼接不再触发 CSV 分类缺陷3: ALTERNATE RECORD KEY 被 ORGANIZATION IS 覆盖 - 旧: 文件编成先于替代索引（同确信度先者胜） - 新: 替代索引放前面（更具体的分类优先）回归: 767 passed（0 new failures）	2026-06-21 15:51:30 +08:00
NB-076	4b22c3754e	fix: 无连字符 KEY 变量 + COBOL 专家 10 大攻击面测试 COBOL 专家对抗性审查发现: - 老式 COBOL 的 WSKEY1/WSKEY2（无连字符）未被 L1 关键词检测 - 结构性检测信号 4 和 5 覆盖不全修复: - L1 增加 re:WS[A-Z0-9]KEY[A-Z0-9] 覆盖无连字符 KEY 命名 - _matches_key_comparison 扩展支持无连字符变量 - has_key_var 注入扩展支持无连字符 - 结构性检测信号 4 增加 WS\w+ 比较模式 - 结构性检测信号 5 增加两个单独 OPEN 的支持新测试: - test_cobol_expert_attacks — 4 个内联攻击测试 (跨行AT END, 无连字符WSKEY, GO TO风格, NOT=比较) - test-adversarial: 8 个样本文件攻击测试全回归: 767 passed (+3 new, 0 failures)	2026-06-21 15:35:52 +08:00
NB-076	da5d1058e7	feat: structural matching detection — no KEY variable needed Add _detect_matching_structure(): detection based on control flow pattern, not variable naming conventions. Uses 5 structural signals: 1. READ + AT END + EOF pattern 2. PERFORM UNTIL with EOF condition 3. ELSE body with conditional READ (matching core) 4. IF comparing hyphenated fields (cross-file comparison) 5. Multi-file OPEN INPUT 5/5 signals → 0.55, 4/5 → 0.50, 3/5 → 0.40. Real-world impact: matching programs with key fields named CUST-CODE and ORDR-CODE (no '-KEY' in name) are now correctly detected. Also: - Rule engine type priority: main types (マッチング etc.) override secondary types (M:N, DIVIDE) when keyword confidence is low - has_structural_match injected into features so rule engine can use it - matching_vs_keybreak accepts equality IFs as matching evidence - New test: test_structural_matching_no_keyword() Regression: 764 passed (0 new failures).	2026-06-21 15:28:32 +08:00
NB-076	33762ca959	fix: adversarial testing — 4 false positive/negative fixes + comment stripping COBOL migration expert adversarial testing found 4 real defects: FIX 1: Comment-stripping in detect_keyword() (FP-2) - Remove > inline comments and comment lines before keyword matching - Prevents 「マッチング」 from triggering on WS-KEY in comments FIX 2: KEY comparison context validation (FP-1, FP-6) - Add _matches_key_comparison() — requires WS-KEY variable to appear NEAR an actual comparison operator (= < >), not just as PIC/VALUE decl - Same check in _path_rule_engine features via has_key_var injection - Fix regex bug: [=<>\s] vs [=<>] — \s matched whitespace after PIC decl FIX 3: Old-school naming support (FN-1) - Add L1 keyword r'[A-Z]\d{0,2}-\w*KEY' with 0.55 confidence - Matches K01-KEY, KS-KEY etc. (non-WS- prefix naming convention) FIX 4: mn_output_mode over-matching (FP-6) - Require IF branches + KEY evidence before returning M:N for file>=3 - matching_vs_keybreak rule 3 now requires has_key_var New tests: test_adversarial.py — 8 parametrized adversarial tests Regression: 755 passed (0 new failures)	2026-06-21 15:16:41 +08:00
NB-076	7d5c82e0e2	feat: matching program subtype discrimination (1:1/1:N/M:N/MxN) Add _resolve_matching_subtype post-processing step in classify_program() that distinguishes matching program subtypes based on key variable naming patterns and file/structural features: Rules (in priority order): 1. 二段階 → 二段階 (already handled by rule engine) 2. 3 files + WS-SAVE-KEY → M:N→MxN (MT20) 3. WS-PREV-KEY present → 混合 (already handled, MT32) 4. WS-MAST-KEY + WS-TRAN-KEY → 1:N (MT02) 5. >=3 KEY vars + >=2 files → M:N (MT33) 6. Otherwise → 1:1 (MT01, MT03, MT18, MT19) Results: MT01→1:1, MT02→1:N, MT03→1:1, MT16/17→二段階, MT18/19→1:1, MT20→M:N→MxN, MT33→M:N Also fix double-backslash regex bug in classifier.py and pipeline.py (r'[-\w]' should be r'[\w-]' for word character class). Regression: 745 passed (unchanged).	2026-06-21 13:33:25 +08:00
NB-076	65e9919933	feat: matching program full recognition — L1 regex keyword + confidence consensus Three-part fix for matching program classification: 1. L1 regex keyword WS-[-\w]*KEY (confidence 0.65): - Captures WS-KEY, WS-MAST-KEY, WS-TRAN-KEY, WS-PREV-KEY etc. - Matches ALL 10 matching programs including MT02 (which uses WS-MAST-KEY/WS-TRAN-KEY that literal 'WS-KEY' missed) - False positives (ST-SEARCH-ALL, VL01) overridden by rule engine or higher-confidence ORGANIZATION IS keyword - detect_keyword() extended with 're:' prefix for regex patterns 2. Consensus bonus in compute_confidence_v2: - When L1 keyword category matches rule engine's final category, context_factor boosted by +0.15 - Pushes matching programs from manual (0.50-0.69) toward review (0.70-0.89) range 3. Confidence calibration for confusion groups (previous commit): - dedup_vs_nodedup: 0.85→0.50 for negative detection - validation_vs_keybreak: 0.80→0.55 for has_counter - simple_vs_two_stage: 0.80→0.50 for sequential OPEN Results - matching programs: MT01: 0.38→0.75, MT02: 0.30→0.60, MT03: 0.30→0.60, MT16: 0.45→0.81, MT17: 0.36→0.65, MT18: 0.60→0.60, MT19: 0.30→0.60, MT20: 0.30→0.65, MT33: 0.30→0.60 All now rule_engine (not fallback), no false negatives. Subtype discrimination remains for future work: all matching programs classified as マッチング without 1:1/1:N/N:1 subtype.	2026-06-21 13:25:39 +08:00
hangshuo652	e2486db510	fix: 3 issues found during real COBOL validation	2026-06-18 16:26:44 +08:00
hangshuo652	de506d9c31	feat: Phase 2 - HINA Agent + Strategy Agent + classifier	2026-06-18 16:10:38 +08:00

8 Commits