feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark

P0.6: gcov infrastructure P1: extract_structure output expansion (11 new feature fields) P2: Confusion group rule engine (8 pairs + contradiction + backtrack) P3: 4-factor confidence calculation + quality gate update P4: 33+2 COBOL program type test samples (22 files, 7 categories) P5: parametrized/ test data generation engine P6: japanese_data.py lookup tables P7-10: Type-specific test suites (~159 parametrized tests) P11: Full classification pipeline (classify_program) + orchestrator integration P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix) Architecture decisions: - classification_pipeline/ merged to hina/pipeline/ - parametrized/ as independent module - japanese_data.py as root-level file - hina/__all__ only exports classify_program() Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-19 23:51:55 +08:00
parent 63b5284715
commit bc1d56d1a4
129 changed files with 19378 additions and 261 deletions
@@ -1,3 +1,29 @@
+"""COBOL 数据模型 — 所有模块共享的契约
+
+本包定义了全系统共用的数据类。所有模块的输入/输出必须使用这些类。
+修改本包需通知所有开发者。
+
+导入方式:
+  from data import Field, FieldTree             # 字段树
+  from data import TestCase, TestSuite, SparkConfig  # 测试数据
+  from data import FieldResult, VerificationRun  # 对比结果
+"""
+
+from __future__ import annotations
+
 from .field_tree import Field, FieldTree
 from .test_case import TestCase, TestSuite, SparkConfig
 from .diff_result import FieldResult, VerificationRun
+
+__all__ = [
+    # ═══ 字段树 ── cobol_testgen / comparator / agents 共用 ═══
+    "Field",             # dataclass — 单个字段定义
+    "FieldTree",         # dataclass — COPYBOOK 字段树
+    # ═══ 测试数据 ── cobol_testgen / runners 共用 ═══
+    "TestCase",          # dataclass — 单条测试用例
+    "TestSuite",         # dataclass — 测试套件（含 Spark 配置）
+    "SparkConfig",       # dataclass — Spark 运行参数
+    # ═══ 对比结果 ── comparator / report / orchestrator 共用 ═══
+    "FieldResult",       # dataclass — 单个字段对比结果
+    "VerificationRun",   # dataclass — 管道运行全结果
+]