cobol-java-v3

Author	SHA1	Message	Date
NB-076	cdba324b5a	fix: HINA 全类型缺陷修复 — SORT/CSV/ALT 3 个真实缺陷对抗性全类型测试发现的缺陷和修复: 缺陷1: SORT/MERGE L1 关键词太严格（漏检） - 旧: 'SORT ON KEY' / 'MERGE ON KEY'（精确字符串） - COBOL 中的真实写法: SORT WORK-FILE ON ASCENDING KEY ... - 新: 正则 SORT(?:\s+\S+)?\s+ON\s+(?:ASCENDING\|DESCENDING)?KEY 缺陷2: CSV 假阳性（STRING/INSPECT 非CSV也触发） - 旧: has_string=True -> CSV合并 - 新: 要求 has_csv_merge（STRING+逗号分隔） - 单纯字符串拼接不再触发 CSV 分类缺陷3: ALTERNATE RECORD KEY 被 ORGANIZATION IS 覆盖 - 旧: 文件编成先于替代索引（同确信度先者胜） - 新: 替代索引放前面（更具体的分类优先）回归: 767 passed（0 new failures）	2026-06-21 15:51:30 +08:00
NB-076	33762ca959	fix: adversarial testing — 4 false positive/negative fixes + comment stripping COBOL migration expert adversarial testing found 4 real defects: FIX 1: Comment-stripping in detect_keyword() (FP-2) - Remove > inline comments and comment lines before keyword matching - Prevents 「マッチング」 from triggering on WS-KEY in comments FIX 2: KEY comparison context validation (FP-1, FP-6) - Add _matches_key_comparison() — requires WS-KEY variable to appear NEAR an actual comparison operator (= < >), not just as PIC/VALUE decl - Same check in _path_rule_engine features via has_key_var injection - Fix regex bug: [=<>\s] vs [=<>] — \s matched whitespace after PIC decl FIX 3: Old-school naming support (FN-1) - Add L1 keyword r'[A-Z]\d{0,2}-\w*KEY' with 0.55 confidence - Matches K01-KEY, KS-KEY etc. (non-WS- prefix naming convention) FIX 4: mn_output_mode over-matching (FP-6) - Require IF branches + KEY evidence before returning M:N for file>=3 - matching_vs_keybreak rule 3 now requires has_key_var New tests: test_adversarial.py — 8 parametrized adversarial tests Regression: 755 passed (0 new failures)	2026-06-21 15:16:41 +08:00
NB-076	958b12e9a9	fix: confusion group confidence calibration — false positive detection inflation Issues found through matching program classification analysis: 1. dedup_vs_nodedup: 0.85→0.50 for negative detection (no WS-PREV-KEY is not strong evidence for '含まず') 2. validation_vs_keybreak: 0.80→0.55 for has_counter (counter is a generic pattern, not specific to key-break) 3. simple_vs_two_stage: 0.80→0.50 for non-open-close-open pattern (sequential OPEN is the default for most programs) Result: matching programs now correctly classified: - MT01-03/18/20 → マッチング ✅ (was 項目チェック) - MT16-17 → 二段階マッチング ✅ (unchanged) - MT32 → 項目チェック(重複含む) ✅ (correct: has WS-PREV-KEY) - VL01 → 項目チェック(重複含む) ✅ (correct) - CSV → CSV合并 ✅ (correct) Regression: 745 passed (3 test expectation bounds updated)	2026-06-21 13:17:31 +08:00
NB-076	a6c454692a	fix: resolve 3 MEDIUM code review findings M1: Cache confusion-pair confidences in Path B (eliminate redundant resolve_confusion_pair re-calls in _path_rule_engine) M2: Resolve contradictions in Path C instead of hardcoding resolved_count=0 in _path_llm_assisted M4: Add DIVIDE_25 to contradiction pair coverage (50-25, 100-25) and update test_contradiction_pairs_defined to verify all 3 variants	2026-06-21 11:25:59 +08:00
hangshuo652	bc1d56d1a4	feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark P0.6: gcov infrastructure P1: extract_structure output expansion (11 new feature fields) P2: Confusion group rule engine (8 pairs + contradiction + backtrack) P3: 4-factor confidence calculation + quality gate update P4: 33+2 COBOL program type test samples (22 files, 7 categories) P5: parametrized/ test data generation engine P6: japanese_data.py lookup tables P7-10: Type-specific test suites (~159 parametrized tests) P11: Full classification pipeline (classify_program) + orchestrator integration P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix) Architecture decisions: - classification_pipeline/ merged to hina/pipeline/ - parametrized/ as independent module - japanese_data.py as root-level file - hina/__all__ only exports classify_program() Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-19 23:51:55 +08:00

5 Commits