feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark

P0.6: gcov infrastructure
P1: extract_structure output expansion (11 new feature fields)
P2: Confusion group rule engine (8 pairs + contradiction + backtrack)
P3: 4-factor confidence calculation + quality gate update
P4: 33+2 COBOL program type test samples (22 files, 7 categories)
P5: parametrized/ test data generation engine
P6: japanese_data.py lookup tables
P7-10: Type-specific test suites (~159 parametrized tests)
P11: Full classification pipeline (classify_program) + orchestrator integration
P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix)

Architecture decisions:
- classification_pipeline/ merged to hina/pipeline/
- parametrized/ as independent module
- japanese_data.py as root-level file
- hina/__all__ only exports classify_program()

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
hangshuo652
2026-06-19 23:51:55 +08:00
parent 63b5284715
commit bc1d56d1a4
129 changed files with 19378 additions and 261 deletions
+26
View File
@@ -1,3 +1,29 @@
"""COBOL 数据模型 — 所有模块共享的契约
本包定义了全系统共用的数据类。所有模块的输入/输出必须使用这些类。
修改本包需通知所有开发者。
导入方式:
from data import Field, FieldTree # 字段树
from data import TestCase, TestSuite, SparkConfig # 测试数据
from data import FieldResult, VerificationRun # 对比结果
"""
from __future__ import annotations
from .field_tree import Field, FieldTree
from .test_case import TestCase, TestSuite, SparkConfig
from .diff_result import FieldResult, VerificationRun
__all__ = [
# ═══ 字段树 ── cobol_testgen / comparator / agents 共用 ═══
"Field", # dataclass — 单个字段定义
"FieldTree", # dataclass — COPYBOOK 字段树
# ═══ 测试数据 ── cobol_testgen / runners 共用 ═══
"TestCase", # dataclass — 单条测试用例
"TestSuite", # dataclass — 测试套件(含 Spark 配置)
"SparkConfig", # dataclass — Spark 运行参数
# ═══ 对比结果 ── comparator / report / orchestrator 共用 ═══
"FieldResult", # dataclass — 单个字段对比结果
"VerificationRun", # dataclass — 管道运行全结果
]