feat: Phase 2 complete — 13 Phases of COBOL type classification and test benchmark

P0.6: gcov infrastructure
P1: extract_structure output expansion (11 new feature fields)
P2: Confusion group rule engine (8 pairs + contradiction + backtrack)
P3: 4-factor confidence calculation + quality gate update
P4: 33+2 COBOL program type test samples (22 files, 7 categories)
P5: parametrized/ test data generation engine
P6: japanese_data.py lookup tables
P7-10: Type-specific test suites (~159 parametrized tests)
P11: Full classification pipeline (classify_program) + orchestrator integration
P12: Documentation (module-interfaces, test-plan v3.0, coverage-matrix)

Architecture decisions:
- classification_pipeline/ merged to hina/pipeline/
- parametrized/ as independent module
- japanese_data.py as root-level file
- hina/__all__ only exports classify_program()

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
hangshuo652
2026-06-19 23:51:55 +08:00
parent 63b5284715
commit bc1d56d1a4
129 changed files with 19378 additions and 261 deletions
+29
View File
@@ -1,3 +1,10 @@
"""测试数据模型 — 测试用例 + 测试套件 + Spark 配置
使用例:
tc = TestCase(id="TC-001", fields={"TX-AMOUNT": 1500000})
suite = TestSuite(test_cases=[tc], spark_config=SparkConfig(num_records=1000))
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional
@@ -5,6 +12,14 @@ from typing import Optional
@dataclass
class SparkConfig:
"""Spark 测试数据生成配置。
────────── 字段说明 ──────────
num_records — 生成的记录数
replication — 复制策略: "key_varied" / "exact_copy"
key_field — 键字段名(key_varied 用)
edge_cases — 边缘 case: ["null","max","min","empty"]
"""
num_records: int = 100
replication: str = "key_varied"
key_field: str = ""
@@ -13,6 +28,13 @@ class SparkConfig:
@dataclass
class TestCase:
"""单条测试用例 — 一条待验证的字段值组合。
────────── 字段说明 ──────────
id — 用例 ID(如 "TC-001"
fields — {字段名: 值}
coverage_targets — 覆盖的决策点 ID 列表
"""
id: str
fields: dict = field(default_factory=dict)
coverage_targets: list[str] = field(default_factory=list)
@@ -20,6 +42,13 @@ class TestCase:
@dataclass
class TestSuite:
"""测试套件 — 多条用例 + 可选 Spark 配置。
────────── 字段说明 ──────────
schema — 可选的字段 schema
test_cases — 测试用例列表
spark_config — None 表示非 Spark 模式
"""
schema: Optional[dict] = None
test_cases: list[TestCase] = field(default_factory=list)
spark_config: Optional[SparkConfig] = None