feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark

P0.6: gcov infrastructure
P1: extract_structure output expansion (11 new feature fields)
P2: Confusion group rule engine (8 pairs + contradiction + backtrack)
P3: 4-factor confidence calculation + quality gate update
P4: 33+2 COBOL program type test samples (22 files, 7 categories)
P5: parametrized/ test data generation engine
P6: japanese_data.py lookup tables
P7-10: Type-specific test suites (~159 parametrized tests)
P11: Full classification pipeline (classify_program) + orchestrator integration
P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix)

Architecture decisions:
- classification_pipeline/ merged to hina/pipeline/
- parametrized/ as independent module
- japanese_data.py as root-level file
- hina/__all__ only exports classify_program()

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
hangshuo652
2026-06-19 23:51:55 +08:00
parent 63b5284715
commit bc1d56d1a4
129 changed files with 19378 additions and 261 deletions
+6 -5
View File
@@ -7,7 +7,8 @@ logger = logging.getLogger(__name__)
def collect_gcov(cobol_src: Path, work_dir: Path) -> dict:
try:
gcda_files = list(work_dir.glob("*.gcda"))
cd = str(work_dir)
gcda_files = list(Path(cd).glob("*.gcda"))
if not gcda_files:
logger.warning("[gcov] 未找到 .gcda 文件,可能未启用插桩编译")
return {"available": False, "reason": "no_gcda_files"}
@@ -15,16 +16,16 @@ def collect_gcov(cobol_src: Path, work_dir: Path) -> dict:
result = subprocess.run(
["gcov", cobol_src.name],
capture_output=True, text=True, timeout=30,
cwd=work_dir,
cwd=cd,
)
if result.returncode != 0:
logger.warning(f"[gcov] gcov 执行失败: {result.stderr[:200]}")
return {"available": False, "reason": "gcov_failed"}
gcov_file = work_dir / f"{cobol_src.stem}.cbl.gcov"
gcov_file = Path(cd) / f"{cobol_src.stem}.cbl.gcov"
if not gcov_file.exists():
gcov_file = work_dir / f"{cobol_src.stem}.gcov"
gcov_file = Path(cd) / f"{cobol_src.stem}.gcov"
if not gcov_file.exists():
logger.warning("[gcov] .gcov 文件未生成")
@@ -32,7 +33,7 @@ def collect_gcov(cobol_src: Path, work_dir: Path) -> dict:
total_lines = 0
executed_lines = 0
with open(gcov_file) as f:
with open(str(gcov_file), encoding="utf-8", errors="replace") as f:
for line in f:
stripped = line.strip()
if stripped and not stripped.startswith("-"):