cobol-java-v3

Author	SHA1	Message	Date
NB-076	875c593d85	fix: 構造検知の根本的改善 — 変数名に依存しないマッチング検出 COBOL技術者による徹底検証で発見された根本問題と修正: 問題1: 構造検知の信号が変数名の命名規則に依存しすぎていた - EOF 固定 → WS-E1/WS-END-1/FE-1 も検知 - INTO ありのみ → READ AT END のみも検知 - IF 比較が WS- またはハイフン必須 → どんな名前でも検知 - OPEN 1行複数ファイルのみ → 複数行も検知問題2: mn_output_mode が2ファイル4分岐でも M:N と誤判定 - しきい値を select>=3 or (select>=2 and 分岐>=4) に引き上げ - 標準的な2ファイルマッチングプログラムを誤判定しない問題3: has_cross_file_cmp が欠落していた - ルールエンジンに IF K1 = K2 のような比較情報を注入 - 数字リテラルとの比較は除外（IF WS-COUNT > 0 など) 効果: 6種類の異なるコーディングスタイルすべてが一貫してマッチング判定回帰: 767 passed (0 new)	2026-06-21 16:27:17 +08:00
NB-076	4be2aae66d	fix: 生产级 COBOL 程序解析 — COPY + OCCURS TO + FD 修复对抗性测试发现的生产程序解析缺陷和修复: 缺陷1: COPY 语句从未被预处理（18 个月 bug） - resolve_copybooks() 在 main() CLI 中调用但在 extract_structure() 路径中从未被调用 - 修复: preprocess() 函数头部调用 resolve_copybooks() - 不可解析的 COPY 行被移除（避免 Lark 在 FD 块内遇到无法识别的指令）缺陷2: Lark 语法的 fd 规则要求 data_item+ (至少一个记录) - 生产程序 FD 可以通过 COPY 引入记录定义 - COPY 被移除后 FD 内无 data_item 导致 Lark 崩溃 - 修复: fd 改为 data_item* (零或多个) 缺陷3: OCCURS 1 TO 100 TIMES（变量范围表） - 语法只支持 OCCURS INT TIMES，不支持 OCCURS 1 TO 100 TIMES - 修复: occurs_clause 增加 'TO' INT 可选部分效果: 4 个生产程序中 2 个成功解析（CRDVAL, GENDATA） - 剩余 2 个（CRDCALC, CRDRPT）因固定格式续行限制未修复全回归: 767 passed（0 new failures）	2026-06-21 16:13:58 +08:00
NB-076	cdba324b5a	fix: HINA 全类型缺陷修复 — SORT/CSV/ALT 3 个真实缺陷对抗性全类型测试发现的缺陷和修复: 缺陷1: SORT/MERGE L1 关键词太严格（漏检） - 旧: 'SORT ON KEY' / 'MERGE ON KEY'（精确字符串） - COBOL 中的真实写法: SORT WORK-FILE ON ASCENDING KEY ... - 新: 正则 SORT(?:\s+\S+)?\s+ON\s+(?:ASCENDING\|DESCENDING)?KEY 缺陷2: CSV 假阳性（STRING/INSPECT 非CSV也触发） - 旧: has_string=True -> CSV合并 - 新: 要求 has_csv_merge（STRING+逗号分隔） - 单纯字符串拼接不再触发 CSV 分类缺陷3: ALTERNATE RECORD KEY 被 ORGANIZATION IS 覆盖 - 旧: 文件编成先于替代索引（同确信度先者胜） - 新: 替代索引放前面（更具体的分类优先）回归: 767 passed（0 new failures）	2026-06-21 15:51:30 +08:00
NB-076	4b22c3754e	fix: 无连字符 KEY 变量 + COBOL 专家 10 大攻击面测试 COBOL 专家对抗性审查发现: - 老式 COBOL 的 WSKEY1/WSKEY2（无连字符）未被 L1 关键词检测 - 结构性检测信号 4 和 5 覆盖不全修复: - L1 增加 re:WS[A-Z0-9]KEY[A-Z0-9] 覆盖无连字符 KEY 命名 - _matches_key_comparison 扩展支持无连字符变量 - has_key_var 注入扩展支持无连字符 - 结构性检测信号 4 增加 WS\w+ 比较模式 - 结构性检测信号 5 增加两个单独 OPEN 的支持新测试: - test_cobol_expert_attacks — 4 个内联攻击测试 (跨行AT END, 无连字符WSKEY, GO TO风格, NOT=比较) - test-adversarial: 8 个样本文件攻击测试全回归: 767 passed (+3 new, 0 failures)	2026-06-21 15:35:52 +08:00
NB-076	da5d1058e7	feat: structural matching detection — no KEY variable needed Add _detect_matching_structure(): detection based on control flow pattern, not variable naming conventions. Uses 5 structural signals: 1. READ + AT END + EOF pattern 2. PERFORM UNTIL with EOF condition 3. ELSE body with conditional READ (matching core) 4. IF comparing hyphenated fields (cross-file comparison) 5. Multi-file OPEN INPUT 5/5 signals → 0.55, 4/5 → 0.50, 3/5 → 0.40. Real-world impact: matching programs with key fields named CUST-CODE and ORDR-CODE (no '-KEY' in name) are now correctly detected. Also: - Rule engine type priority: main types (マッチング etc.) override secondary types (M:N, DIVIDE) when keyword confidence is low - has_structural_match injected into features so rule engine can use it - matching_vs_keybreak accepts equality IFs as matching evidence - New test: test_structural_matching_no_keyword() Regression: 764 passed (0 new failures).	2026-06-21 15:28:32 +08:00
NB-076	33762ca959	fix: adversarial testing — 4 false positive/negative fixes + comment stripping COBOL migration expert adversarial testing found 4 real defects: FIX 1: Comment-stripping in detect_keyword() (FP-2) - Remove > inline comments and comment lines before keyword matching - Prevents 「マッチング」 from triggering on WS-KEY in comments FIX 2: KEY comparison context validation (FP-1, FP-6) - Add _matches_key_comparison() — requires WS-KEY variable to appear NEAR an actual comparison operator (= < >), not just as PIC/VALUE decl - Same check in _path_rule_engine features via has_key_var injection - Fix regex bug: [=<>\s] vs [=<>] — \s matched whitespace after PIC decl FIX 3: Old-school naming support (FN-1) - Add L1 keyword r'[A-Z]\d{0,2}-\w*KEY' with 0.55 confidence - Matches K01-KEY, KS-KEY etc. (non-WS- prefix naming convention) FIX 4: mn_output_mode over-matching (FP-6) - Require IF branches + KEY evidence before returning M:N for file>=3 - matching_vs_keybreak rule 3 now requires has_key_var New tests: test_adversarial.py — 8 parametrized adversarial tests Regression: 755 passed (0 new failures)	2026-06-21 15:16:41 +08:00
NB-076	a5939e6722	fix: subtype resolver + comprehensive matching program test Fix 4 remaining defects found by adversarial testing: 1. MT03 N:1 → subtype corrected to N:1 (key suffix -M/-T heuristic) 2. MT32 混合 → subtype added (項目チェック programs with WS-PREV-KEY) 3. MT33 混合异键 → WS-ALT-KEY detection → 混合(异键) 4. MT18/MT19 → subtype M:N (correct: static cannot distinguish M:N→M vs M:N→N) Also expand subtype resolver scope: now also processes 項目チェック classified programs with matching-like characteristics (WS-PREV-KEY), not just マッチング. New test: test_matching_programs.py — 10 parametrized tests covering all 4 dimensions (category, subtype, branches, files) for every matching program. Known limitation documented: MT18 vs MT19 requires runtime data for M:N→M vs M:N→N distinction. Regression: 755 passed (10 new, 0 failures).	2026-06-21 13:40:58 +08:00
NB-076	6b3f526b80	feat: agent-driven matching subtype discrimination Refactor _resolve_matching_subtype to use an LLM agent for ambiguous cases instead of pure static rules: Architecture (3 layers): 1. Static deterministic rules: M:N→MxN, 1:N (WS-MAST/TRAN-KEY), 二段階, 混合 — high confidence, no LLM needed 2. LLM agent: ambiguous cases (N:1 vs 1:1, M:N→M vs M:N→N) - _MATCHING_SUBTYPE_AGENT_PROMPT with 5 subtypes - Calls existing hina.hina_agent._parse_llm_response for parsing - Minimum confidence threshold 0.4 to gate low-quality LLM output 3. Fallback: conservative defaults (M:N or 1:1) when LLM unavailable This follows the original architecture design: agent handles the hard classification problems that static analysis alone can't resolve. Regression: 745 passed (unchanged).	2026-06-21 13:36:57 +08:00
NB-076	7d5c82e0e2	feat: matching program subtype discrimination (1:1/1:N/M:N/MxN) Add _resolve_matching_subtype post-processing step in classify_program() that distinguishes matching program subtypes based on key variable naming patterns and file/structural features: Rules (in priority order): 1. 二段階 → 二段階 (already handled by rule engine) 2. 3 files + WS-SAVE-KEY → M:N→MxN (MT20) 3. WS-PREV-KEY present → 混合 (already handled, MT32) 4. WS-MAST-KEY + WS-TRAN-KEY → 1:N (MT02) 5. >=3 KEY vars + >=2 files → M:N (MT33) 6. Otherwise → 1:1 (MT01, MT03, MT18, MT19) Results: MT01→1:1, MT02→1:N, MT03→1:1, MT16/17→二段階, MT18/19→1:1, MT20→M:N→MxN, MT33→M:N Also fix double-backslash regex bug in classifier.py and pipeline.py (r'[-\w]' should be r'[\w-]' for word character class). Regression: 745 passed (unchanged).	2026-06-21 13:33:25 +08:00
NB-076	65e9919933	feat: matching program full recognition — L1 regex keyword + confidence consensus Three-part fix for matching program classification: 1. L1 regex keyword WS-[-\w]*KEY (confidence 0.65): - Captures WS-KEY, WS-MAST-KEY, WS-TRAN-KEY, WS-PREV-KEY etc. - Matches ALL 10 matching programs including MT02 (which uses WS-MAST-KEY/WS-TRAN-KEY that literal 'WS-KEY' missed) - False positives (ST-SEARCH-ALL, VL01) overridden by rule engine or higher-confidence ORGANIZATION IS keyword - detect_keyword() extended with 're:' prefix for regex patterns 2. Consensus bonus in compute_confidence_v2: - When L1 keyword category matches rule engine's final category, context_factor boosted by +0.15 - Pushes matching programs from manual (0.50-0.69) toward review (0.70-0.89) range 3. Confidence calibration for confusion groups (previous commit): - dedup_vs_nodedup: 0.85→0.50 for negative detection - validation_vs_keybreak: 0.80→0.55 for has_counter - simple_vs_two_stage: 0.80→0.50 for sequential OPEN Results - matching programs: MT01: 0.38→0.75, MT02: 0.30→0.60, MT03: 0.30→0.60, MT16: 0.45→0.81, MT17: 0.36→0.65, MT18: 0.60→0.60, MT19: 0.30→0.60, MT20: 0.30→0.65, MT33: 0.30→0.60 All now rule_engine (not fallback), no false negatives. Subtype discrimination remains for future work: all matching programs classified as マッチング without 1:1/1:N/N:1 subtype.	2026-06-21 13:25:39 +08:00
NB-076	958b12e9a9	fix: confusion group confidence calibration — false positive detection inflation Issues found through matching program classification analysis: 1. dedup_vs_nodedup: 0.85→0.50 for negative detection (no WS-PREV-KEY is not strong evidence for '含まず') 2. validation_vs_keybreak: 0.80→0.55 for has_counter (counter is a generic pattern, not specific to key-break) 3. simple_vs_two_stage: 0.80→0.50 for non-open-close-open pattern (sequential OPEN is the default for most programs) Result: matching programs now correctly classified: - MT01-03/18/20 → マッチング ✅ (was 項目チェック) - MT16-17 → 二段階マッチング ✅ (unchanged) - MT32 → 項目チェック(重複含む) ✅ (correct: has WS-PREV-KEY) - VL01 → 項目チェック(重複含む) ✅ (correct) - CSV → CSV合并 ✅ (correct) Regression: 745 passed (3 test expectation bounds updated)	2026-06-21 13:17:31 +08:00
NB-076	0b0a013f51	fix: 3 critical parsing bugs found through statement benchmark testing Bug 1: ELSE IF breaks IF false_seq parsing (core.py) - _parse_if checked self.clean() == 'ELSE' which fails on 'ELSE IF ...' - Fix: use startswith('ELSE'), reinsert IF portion for recursive parse - Impact: ALL ELSE IF chains were silently dropped (huge branch loss) Bug 2: READ skip loop greedily consumes subsequent statements (core.py) - READ's AT END / NOT AT END skip loop used bare advance() with no statement boundary detection - Fix: add _stmt_boundary regex that stops on IF/PERFORM/READ/etc. - Impact: everything after first READ was consumed as 'AT END' lines Bug 3: _walk() in extract_structure doesn't descend into BrPerform (__init__.py) - Branch counting _walk() only handled BrIf/BrEval/BrSeq - IF statements inside PERFORM bodies were never counted - Fix: add BrPerform.body_seq and BrSearch descent Combined impact: matching programs (MT01-33) now correctly report their branches instead of 0. Full regression: 749 passed (unchanged).	2026-06-21 12:52:04 +08:00
NB-076	dbee3b7251	fix: Lark grammar + parse_file_section SD/ASCENDING KEY support Bug fixes found through statement benchmark testing: 1. grammar.lark: Add ASCENDING/DESCENDING KEY IS + INDEXED BY to occurs_clause — fixes HINA024 (SEARCH ALL) parsing crash 2. grammar.lark: Add SD (Sort Description) entry type to file_section — fixes HINA034 (SORT), ST01, ST02 parsing crashes 3. read.py parse_file_section(): Handle SD blocks alongside FD blocks — enables SORT/MERGE file structure extraction 4 previously crashing files now parse successfully: - HINA024.cbl (SEARCH ALL): paras=3, files=0 - HINA034.cbl (SORT): paras=1, files=3 - ST01_SORT.cbl: paras=2, files=3 - ST02_MERGE.cbl: paras=1, files=4 Regression: 749 passed (unchanged — classify_program internally caught the crashes, so tests already 'passed'; real improvement is in data quality: structure extraction now works for these programs)	2026-06-21 12:21:36 +08:00
NB-076	d12a305dc4	test: add L1 data generation + L2 classifier validation (58 tests) Phase C-D complete: - test_l1_data_generation.py — 8 tests verifying generate_data across all P0 groups - test_l2_classifier.py — 16 existing + 34 P0 classification verification tests - hina/pipeline/__init__.py — export classify_program for cleaner imports Key findings: - Classifier correctly detects: CALL→子程序调用, CICS→online, DB→DB操作, ORGANIZATION IS→文件编成, DIVIDE→DIVIDE_50.0, ASCII/EBCDIC→编码转换 (keyword match) - Rule engine provides baseline 項目チェック(重複含まず) for programs without L1 keyword matches - SD keyword (SORT/MERGE sort-file) breaks Lark parser (known limitation) - Full regression: 749 passed (0 new failures)	2026-06-21 12:16:12 +08:00
NB-076	fbaad010ab	test: add L0 statement benchmark tests (34 parametrized tests) 6 test files covering: - test_arithmetic_statements (9 samples) - test_control_statements (6 samples) - test_file_statements (6 samples) - test_inspect_statements (3 samples) - test_move_statements (5 samples) - test_perform_statements (3 samples) - test_search_statements (2 samples) All 34/34 pass. Full regression: 691 passed (0 new failures).	2026-06-21 12:05:07 +08:00
NB-076	8c1f9114f6	feat: add COBOL statement benchmark plan and 34 P0 sample programs - docs/cobol-statement-benchmark-plan.md — full coverage matrix and gap analysis - 34 P0 COBOL samples: arithmetic(9), move(5), file(6), control(6), inspect(3), search(2), perform(3) - test-data/validate_statements.py — automatic validation script - Validation: 34/34 samples pass preprocess + extract_structure	2026-06-21 12:02:25 +08:00
NB-076	a6c454692a	fix: resolve 3 MEDIUM code review findings M1: Cache confusion-pair confidences in Path B (eliminate redundant resolve_confusion_pair re-calls in _path_rule_engine) M2: Resolve contradictions in Path C instead of hardcoding resolved_count=0 in _path_llm_assisted M4: Add DIVIDE_25 to contradiction pair coverage (50-25, 100-25) and update test_contradiction_pairs_defined to verify all 3 variants	2026-06-21 11:25:59 +08:00
hangshuo652	bc1d56d1a4	feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark P0.6: gcov infrastructure P1: extract_structure output expansion (11 new feature fields) P2: Confusion group rule engine (8 pairs + contradiction + backtrack) P3: 4-factor confidence calculation + quality gate update P4: 33+2 COBOL program type test samples (22 files, 7 categories) P5: parametrized/ test data generation engine P6: japanese_data.py lookup tables P7-10: Type-specific test suites (~159 parametrized tests) P11: Full classification pipeline (classify_program) + orchestrator integration P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix) Architecture decisions: - classification_pipeline/ merged to hina/pipeline/ - parametrized/ as independent module - japanese_data.py as root-level file - hina/__all__ only exports classify_program() Co-Authored-By: Claude <noreply@anthropic.com>	2026-06-19 23:51:55 +08:00
hangshuo652	63b5284715	fix: _parse_llm_response now handles empty/invalid JSON gracefully test: add gap coverage tests (hina_agent/JCL/quality gate edge cases)	2026-06-18 17:31:16 +08:00
hangshuo652	b5e76306c3	test: add AI Agent v6 node compliance validation (6 nodes, 24/24)	2026-06-18 17:27:19 +08:00
hangshuo652	e530f6980d	test: add deep validation suite (real COBOL/HINA/QG/retry/report/perf - 28/28)	2026-06-18 17:21:12 +08:00
hangshuo652	6ac9861c84	test: add master validation suite (Pipeline/HINA/Benchmark/QG/Retry/Report - 30/30)	2026-06-18 17:17:11 +08:00
hangshuo652	ecc5599b48	test: add platform user story tests (43/43, 4 categories)	2026-06-18 17:10:40 +08:00
hangshuo652	2662c6c0ac	test: add comprehensive test plan and auto test runner (20/20 passed, 100%)	2026-06-18 17:05:51 +08:00
hangshuo652	9ad0e88a1a	test: add HINA type-specific COBOL test data suite (10 programs, 8/10 pass)	2026-06-18 16:55:43 +08:00
hangshuo652	2e64f208ea	fix: P1 - complete_tests now feeds DataWriter; P2 - loop syncs complete_tests; P5 - machine_json gets coverage fields	2026-06-18 16:47:21 +08:00
hangshuo652	c93104e6bf	feat: Phase 3+4 - gcov support + enhanced report	2026-06-18 16:31:54 +08:00
hangshuo652	e2486db510	fix: 3 issues found during real COBOL validation	2026-06-18 16:26:44 +08:00
hangshuo652	de506d9c31	feat: Phase 2 - HINA Agent + Strategy Agent + classifier	2026-06-18 16:10:38 +08:00
hangshuo652	c021dfe01e	feat: Phase 1 - orchestrator quality gate loop + hina/gate + main CLI args	2026-06-18 16:02:38 +08:00
hangshuo652	097530b036	feat: Phase 1 - cobol_testgen API + quality fields + retry handler	2026-06-18 15:47:35 +08:00
hangshuo652	7fcdb41a85	init: cobol-java migration verification platform v3 (42 tests, JCL module)	2026-05-27 08:42:41 +08:00
hangshuo652	faeedbc77b	test: add edge case tests	2026-05-24 13:01:31 +08:00
hangshuo652	331b38eac1	feat: add web layer (FastAPI + worker)	2026-05-24 12:52:20 +08:00
hangshuo652	818e81269c	v3: gstack-code-gen 生成	2026-05-24 12:36:44 +08:00

35 Commits