50995d3335
- SETUP.md: 完整环境搭建指南(同事用) - SETUP_QUICK.md: 快速搭环境(4步) - s22~s26: TNA端到端、覆盖率报告、回归检查 - procedure_grammar.lark: 实验性Lark语法 Co-Authored-By: Claude <noreply@anthropic.com>
545 lines
20 KiB
Markdown
545 lines
20 KiB
Markdown
# 融合修正方案:Agent 系统 × cobol-java-v3
|
||
|
||
> **目标**: 在不破坏现有 42/42 测试体系的前提下,用 Agent 系统的三层防御(断言门禁 + 异步轮询 + 分层重试)增强 cobol-java-v3。
|
||
> **原则**: 只增不改。不修改现有 `orchestrator.py`、`runners/`、`comparator/` 的内部逻辑,只在其外部添加新层。
|
||
|
||
---
|
||
|
||
## 0. 现状全景
|
||
|
||
### 已有能力
|
||
|
||
```
|
||
cobol_testgen(无 LLM,纯规则)
|
||
COBOL源码 → parse → 字段/分支树 → 路径枚举 → 测试数据JSON → 覆盖率HTML
|
||
|
||
orchestrator.py(LLM 驱动)
|
||
COPYBOOK → Agent1Parser → FieldTree → Agent2Data → TestSuite → DataWriter
|
||
→ CobolRunner → Compile + Run
|
||
→ JavaRunner → Compile + Run
|
||
→ Comparator → 字段级比对 → Agent3Diagnostic → ReportGenerator
|
||
|
||
agents/
|
||
Agent1Parser: LLM 解析 COPYBOOK → FieldTree
|
||
Agent2Data: LLM 生成测试用例(boundary/branch两种策略)
|
||
Agent3Diagnostic: LLM 分析不匹配字段的原因
|
||
|
||
runners/
|
||
CobolRunner: 编译 + 运行 COBOL(GnuCOBOL)
|
||
NativeJavaRunner: 编译 + 运行 Java
|
||
SparkJavaRunner: 编译 + 运行 Spark Java
|
||
```
|
||
|
||
### 缺失能力
|
||
|
||
| 差距 | 影响 | 优先级 |
|
||
|:----|:----|:------:|
|
||
| **HINA 类型无感知** — 所有程序都用同样的"boundary/branch"策略生成测试数据,不对匹配系/键中断系/校验系做区别 | 测试数据没有覆盖该类型特有的边界 | 🔴 |
|
||
| **cobol_testgen 的覆盖率未集成到 pipeline** — `coverage.py` 生成 HTML 报告但不被 `orchestrator.py`调用 | pipeline跑完没有"分支覆盖率"数值 | 🔴 |
|
||
| **无断言质量门禁** — Agent2Data 生成测试用例后直接执行,不检查用例是否覆盖了所有决策点 | 可能漏分支 | 🟡 |
|
||
| **无分层重试** — 编译失败/执行异常直接 BLOCKED/ERROR,不尝试修复 | 编译环境问题造成无效失败 | 🟡 |
|
||
| **Agent2Data 不参考 cobol_testgen 的分析** — cobol_testgen 已经解析了分支树和路径,Agent2Data 从零调用 LLM 设计数据 | LLM 成本浪费、准确性差 | 🟡 |
|
||
| **报告无断言质量分** — 只有 mismatch 计数,没有"测试数据质量"的量化指标 | 报告不完整 | 🟢 |
|
||
|
||
---
|
||
|
||
## 1. 融合架构
|
||
|
||
```
|
||
┌──────────────────────────────┐
|
||
│ Agent 增强层(新增) │
|
||
│ │
|
||
┌─────────▼─────────┐ │
|
||
┌─────┐ │ HINA 分类器 │ │
|
||
│COBOL│─────────►│ 程序类型自动检测 │ │
|
||
│源码 │ │ → 匹配/键中断/校验 │ │
|
||
└─────┘ └─────────┬─────────┘ │
|
||
│ │
|
||
┌────────▼─────────┐ │
|
||
│ 测试策略选择 │ │
|
||
│ 根据类型选择模板 │ │
|
||
│ 加权: 边界值策略 │ │
|
||
│ 分支全覆盖 │ │
|
||
└────────┬─────────┘ │
|
||
│ │
|
||
┌────────▼─────────┐ │
|
||
│ 断言质量门禁 │ ← 新增核心组件 │
|
||
│ 检查: │ │
|
||
│ - 所有决策点覆盖? │ │
|
||
│ - MC/DC 达标? │ │
|
||
│ - 类型特有边界? │ │
|
||
│ 不通过→退回重生成 │ │
|
||
└────────┬─────────┘ │
|
||
│ pass │
|
||
▼ │
|
||
┌──────────────────────────────┐ │
|
||
│ 现有 orchestrator.py │ │
|
||
│ (不改动内部代码) │ │
|
||
│ │ │
|
||
│ Agent1Parser → Agent2Data │ │
|
||
│ → DataWriter → Runners → │ │
|
||
│ Comparator → ReportGenerator│ │
|
||
└──────────┬───────────────────┘ │
|
||
│ │
|
||
┌────▼─────┐ │
|
||
│ 覆盖收集器│ ← 新增(连接 coverage.py)│
|
||
│ 读取 GCOV │ │
|
||
│ 或 cobol │ │
|
||
│ 统计结果 │ │
|
||
└────┬─────┘ │
|
||
│ │
|
||
┌────▼─────┐ │
|
||
│ 报告增强器│ ← 新增 │
|
||
│ 融合: │ │
|
||
│ 字段比对 +│ │
|
||
│ 覆盖率 + │ │
|
||
│ 断言质量分│ │
|
||
└──────────┘ │
|
||
┌──────────────────────────────┐
|
||
┌──►│ 分层重试(编排在 Pipeline 外) │
|
||
│ │ heal_retry: 修复已知模式后重试 │
|
||
│ │ simple_retry: 环境因素重试 │
|
||
│ └──────────────────────────────┘
|
||
│ 退回
|
||
│ 第1次→第2次→第3次→FATAL
|
||
│
|
||
└── 由 run_pipeline 调用者控制
|
||
```
|
||
|
||
---
|
||
|
||
## 2. 新增模块清单(只增不改)
|
||
|
||
### 2.1 `quality/hina_classifier.py` — HINA 类型分类器(新增)
|
||
|
||
**作用**: 在调用 orchestrator 之前,对 COBOL 源码做静态分析,判断程序类型。
|
||
|
||
**实现**: 从 cobol_testgen 的 parse 结果中提取特征,匹配 HINA 分类规则。
|
||
|
||
```python
|
||
# 输入: COBOL 源码路径
|
||
# 输出: HINA 类型(9 类之一)+ 置信度 + 关键特征
|
||
|
||
def classify(proc_division_text: str) -> dict:
|
||
"""
|
||
判断标准:
|
||
- MATCHING 段落 + 2+ INPUT FD → マッチング系
|
||
- KEY-BREAK / BREAK 段落 → キーブレイク系
|
||
- EVALUATE / 多层 IF → 条件分岐系
|
||
- GETPUT / WRITE FROM → 編集処理系
|
||
- EXEC SQL → DB系
|
||
- 定数 25/50/100 で分割 → データ分割系
|
||
- NOT NUMERIC / NOT ALPHABETIC → 項目チェック系
|
||
- SEARCH / SEARCH ALL → 内部処理系
|
||
- EXEC CICS → オンライン系
|
||
"""
|
||
...
|
||
|
||
return {
|
||
"category": "マッチング",
|
||
"subtype": "1:N", # or "general"
|
||
"confidence": 0.95,
|
||
"features": ["MATCHING paragraph", "2 INPUT files"],
|
||
"description": "1:N マッチング + キーブレイク処理",
|
||
}
|
||
```
|
||
|
||
### 2.2 `quality/strategy_selector.py` — 测试策略选择器(新增)
|
||
|
||
**作用**: 根据 HINA 类型,选择或组合测试策略参数。
|
||
|
||
```python
|
||
STRATEGY_TEMPLATES = {
|
||
"マッチング": {
|
||
"coverage": "boundary", # 默认 coverage 策略
|
||
"requires_match_matrix": True, # 需要交叉匹配矩阵数据
|
||
"min_data_pairs": (3, 3), # A file 3件, B file 3件
|
||
"special_boundaries": [
|
||
"一方/両方のファイルが空",
|
||
"キー完全一致 / 不一致 / 空キー",
|
||
"M×N の桁あふれ(>99999件)",
|
||
],
|
||
},
|
||
"キーブレイク": {
|
||
"coverage": "branch",
|
||
"requires_break_sequence": True, # 需要键值变化序列
|
||
"min_sequences": 3, # 至少3组不同的键值
|
||
"special_boundaries": [
|
||
"単一キーのみ(中断なし)",
|
||
"キー切れ直後の集計値リセット",
|
||
"ファイル終了時の最終出力",
|
||
],
|
||
},
|
||
"条件分岐": {
|
||
"coverage": "branch",
|
||
"require_mcdc": True, # MC/DC 覆盖必须
|
||
"require_100pct_branch": True, # 分支覆盖率必须100%
|
||
},
|
||
"データ分割": {
|
||
"coverage": "boundary",
|
||
"divisor": None, # 运行时从源码提取25/50/100
|
||
"boundary_pattern": [
|
||
"0件", "1件",
|
||
"N-1件", "N件", "N+1件", # N=分割数
|
||
"2N-1件", "2N件", "2N+1件",
|
||
],
|
||
},
|
||
"項目チェック": {
|
||
"coverage": "boundary",
|
||
"require_data_matrix": True, # 需要测试数据矩阵
|
||
},
|
||
# ... 其余类型
|
||
}
|
||
```
|
||
|
||
### 2.3 `quality/assertion_gate.py` — 断言质量门禁(新增核心组件)
|
||
|
||
**作用**: 检查 Agent2Data(或 cobol_testgen生成的)测试数据集是否满足质量要求。
|
||
|
||
```python
|
||
def check_test_suite(suite: TestSuite,
|
||
decision_points: list,
|
||
hina_type: dict,
|
||
fields: list) -> dict:
|
||
"""
|
||
检查项目:
|
||
1. 决策点覆盖 → 每个 BrIf/BrEval 至少被一条测试用例覆盖
|
||
2. MC/DC 覆盖(条件分岐系)→ 每个 leaf 有独立影响证据
|
||
3. 类型特有边界 → 检查特殊边界是否被覆盖
|
||
4. 字段角色覆盖 → 每个 input 字段至少有一个非空值
|
||
5. 88-level 覆盖 → 每个 88-level value 至少被使用一次
|
||
|
||
Returns:
|
||
{
|
||
"passed": True/False,
|
||
"score": 0.92,
|
||
"checks": {
|
||
"decision_coverage": {"passed": True, "rate": 0.95, "missing": [...]},
|
||
"mcdc_adequacy": {"passed": True, "pairs_found": 8, "pairs_expected": 10},
|
||
"hina_boundary": {"passed": True, "covered": [...], "missing": [...]},
|
||
"field_roles": {"passed": True, "uncovered_inputs": []},
|
||
"level_88": {"passed": True, "uncovered_88s": []},
|
||
},
|
||
"suggestions": [
|
||
"缺少 Aファイルのみ空のテストケース",
|
||
"未覆盖 88-level VALUE 'D'",
|
||
],
|
||
}
|
||
"""
|
||
...
|
||
|
||
class QualityGate:
|
||
"""质量门禁 — 作为装饰器或 Pipeline 的一步"""
|
||
|
||
def __init__(self, required_score: float = 0.8):
|
||
self.required_score = required_score
|
||
|
||
def evaluate(self, suite, coverage_result, hina_type) -> dict:
|
||
result = check_test_suite(...)
|
||
result["gate_passed"] = result["score"] >= self.required_score
|
||
return result
|
||
|
||
def check(self, suite) -> bool:
|
||
"""快速检查: 是否有明显的假断言/空测试用例"""
|
||
if not suite.test_cases:
|
||
return False
|
||
for tc in suite.test_cases:
|
||
if not tc.fields:
|
||
return False
|
||
return True
|
||
```
|
||
|
||
### 2.4 `quality/coverage_collector.py` — 覆盖率收集器(新增)
|
||
|
||
**作用**: 连接 cobol_testgen 的 coverage.py 到 pipeline,收集分支/段落覆盖率。
|
||
|
||
```python
|
||
from cobol_testgen.coverage import collect_decision_points
|
||
from cobol_testgen.read import extract_procedure_division
|
||
|
||
def collect_coverage_from_cobol(cobol_source: str) -> dict:
|
||
"""从 COBOL 源码收集决策点信息(编译前)"""
|
||
proc = extract_procedure_division(cobol_source)
|
||
tree, _ = build_branch_tree(proc)
|
||
points = collect_decision_points(tree)
|
||
return {
|
||
"total_decision_points": len(points),
|
||
"by_kind": {"IF": ..., "EVALUATE": ..., "PERFORM": ...},
|
||
"total_branches": sum(len(p.branch_names) for p in points),
|
||
"details": points,
|
||
}
|
||
|
||
def compute_coverage_gcov(gcov_report_path: str, decision_points: list) -> dict:
|
||
"""从 GCOV 输出解析实际覆盖率"""
|
||
# 读取 .gcov 文件 → 标记每个决策点的实际执行情况
|
||
...
|
||
|
||
return {
|
||
"statement_coverage": 0.92,
|
||
"branch_coverage": 0.85,
|
||
"paragraph_coverage": 1.0,
|
||
"covered_decision_ids": [1, 2, 3, 5],
|
||
"uncovered_decision_ids": [4],
|
||
}
|
||
```
|
||
|
||
### 2.5 `quality/scorer.py` — 报告质量评分器(新增)
|
||
|
||
**作用**: 生成融合评分,作为报告的一部分。
|
||
|
||
```python
|
||
def compute_quality_score(
|
||
compare_result, # from comparator
|
||
coverage_result, # from coverage_collector
|
||
gate_result, # from assertion_gate
|
||
) -> dict:
|
||
"""
|
||
评分维度:
|
||
- 字段一致性分: 80% (passed_match / total_fields)
|
||
- 分支覆盖率: 60% (covered_branches / total_branches)
|
||
- 断言质量分: 90% (gate_score)
|
||
加权总分: 0.4 × field + 0.3 × coverage + 0.3 × assertion
|
||
|
||
COBOL 版 7 维度:
|
||
1. 段落カバレッジ × 20%
|
||
2. 分岐カバレッジ × 20%
|
||
3. 条件カバレッジ(MC/DC) × 15%
|
||
4. データ境界 × 15%
|
||
5. フィールド一致性(COBOL vs Java) × 15%
|
||
6. ファイル状態カバレッジ × 10%
|
||
7. 88-level カバレッジ × 5%
|
||
"""
|
||
```
|
||
|
||
---
|
||
|
||
## 3. 修改点(最小侵入)
|
||
|
||
### 3.1 `orchestrator.py` 的修改
|
||
|
||
**只改一处**: 在 `run_pipeline()` 的 `suite = Agent2Data(...)` 之后插入质量门禁。
|
||
|
||
```python
|
||
# 修改位置: orchestrator.py 第 43 行附近
|
||
# 原代码:
|
||
suite = Agent2Data(llm).design(tree, cfg.coverage_default, cfg.runner_mode == "spark")
|
||
|
||
# 修改后:
|
||
suite = Agent2Data(llm).design(tree, cfg.coverage_default, cfg.runner_mode == "spark")
|
||
|
||
# ── 质量门禁 ──(新增)
|
||
gate = QualityGate(required_score=0.8)
|
||
gate_result = gate.evaluate(suite, coverage_data, hina_type)
|
||
if not gate_result["gate_passed"]:
|
||
# 不阻断 pipeline 但记录到报告
|
||
vr.debug["quality_gate"] = gate_result
|
||
vr.quality_gate_passed = False
|
||
# ── 结束 ──
|
||
```
|
||
|
||
**原则**: 质量门禁不阻断 pipeline(测试仍可执行),但报告会标注警告。阻断是用户可配置选项。
|
||
|
||
### 3.2 `config.py` 的修改
|
||
|
||
**增加配置项**:
|
||
|
||
```python
|
||
# quality gate
|
||
quality_gate_enabled: bool = True
|
||
quality_gate_min_score: float = 0.8
|
||
quality_gate_blocking: bool = False # True = 不通过则不执行
|
||
|
||
# coverage
|
||
coverage_collect: bool = True
|
||
coverage_gcov_path: str = "" # 如果留空,仅用 cobol_testgen 的静态分析
|
||
|
||
# hina
|
||
hina_classify: bool = True
|
||
hina_override: str = "" # 手动指定 HINA 类型
|
||
```
|
||
|
||
### 3.3 `report/generator.py` 的修改
|
||
|
||
**增加质量评分章节**:
|
||
|
||
```python
|
||
# 在 generate_html 方法中增加质量评分卡片
|
||
def _quality_section(self, vr: VerificationRun) -> str:
|
||
qg = vr.debug.get("quality_gate", {})
|
||
if not qg:
|
||
return ""
|
||
score = qg.get("score", 0)
|
||
color = "green" if score >= 0.8 else ("yellow" if score >= 0.6 else "red")
|
||
checks = qg.get("checks", {})
|
||
|
||
return f"""
|
||
<div class="section">
|
||
<h2>测试数据质量评分</h2>
|
||
<div class="grid">
|
||
<div class="card" style="border-color: {color}">
|
||
<h3>总质量分</h3>
|
||
<div class="value">{score:.0%}</div>
|
||
</div>
|
||
{''.join(
|
||
f'<div class="card"><h3>{k}</h3><div class="value {"pass" if v["passed"] else "warn"}">{v.get("rate", 0):.0%}</div></div>'
|
||
for k, v in checks.items()
|
||
)}
|
||
</div>
|
||
{'<ul>' + ''.join(f'<li>{s}</li>' for s in qg.get('suggestions', [])) + '</ul>' if qg.get('suggestions') else ''}
|
||
{'<p class="warn">⚠️ 质量门禁未通过</p>' if not qg.get('gate_passed') else '<p class="pass">✅ 质量门禁通过</p>'}
|
||
</div>
|
||
"""
|
||
```
|
||
|
||
---
|
||
|
||
## 4. 分层重试(Pipeline 编排层)
|
||
|
||
不在 orchestrator 内部改,而由**调用者**控制。
|
||
|
||
```python
|
||
# 当前调用方式(main.py):
|
||
vr = run_pipeline(c, args.copybook, args.cobol_src, args.java_src, args.mapping)
|
||
|
||
# 启用重试后:
|
||
from quality.retry import RetryHandler
|
||
|
||
handler = RetryHandler(
|
||
max_heal_retries=2,
|
||
max_simple_retries=3,
|
||
known_fixes={
|
||
"BLOCKED": [
|
||
(lambda v: "not found" in str(v.report_path),
|
||
lambda: install_dependency()), # 修复并重试
|
||
(lambda v: "compile" in str(v.debug.get("cobol_build", {})).lower(),
|
||
lambda: clean_and_rebuild()), # 清理重编
|
||
],
|
||
"MISMATCH": [
|
||
(lambda v: v.fields_mismatched <= 2,
|
||
lambda: regenerate_data()), # 微调数据后重试
|
||
],
|
||
}
|
||
)
|
||
|
||
vr = handler.run(
|
||
lambda: run_pipeline(c, args.copybook, args.cobol_src, args.java_src, args.mapping)
|
||
)
|
||
print(handler.summary())
|
||
```
|
||
|
||
```python
|
||
# quality/retry.py
|
||
class RetryHandler:
|
||
def __init__(self, max_heal_retries=2, max_simple_retries=3):
|
||
self.heal_count = 0
|
||
self.simple_count = 0
|
||
self.history = []
|
||
|
||
def run(self, pipeline_fn, context: dict = None) -> VerificationRun:
|
||
while self.simple_count + self.heal_count < self._max_total():
|
||
vr = pipeline_fn()
|
||
self.history.append(vr)
|
||
|
||
if vr.status == "PASS":
|
||
return vr
|
||
|
||
# 尝试已知修复
|
||
if vr.status in self.known_fixes:
|
||
for condition, fix in self.known_fixes[vr.status]:
|
||
if condition(vr):
|
||
fix()
|
||
self.heal_count += 1
|
||
break
|
||
else:
|
||
self.simple_count += 1
|
||
continue
|
||
else:
|
||
self.simple_count += 1
|
||
continue
|
||
|
||
# 超过重试上限 → 标记 FATAL
|
||
vr.status = "FATAL"
|
||
vr.exit_code = 4
|
||
return vr
|
||
```
|
||
|
||
---
|
||
|
||
## 5. 与现 Pipeline 的对照
|
||
|
||
| 现 Pipeline 步骤 | 增强方式 | 新增模块 |
|
||
|:---------------|:--------|:--------|
|
||
| Agent1Parser (LLM→FieldTree) | 不修改 | — |
|
||
| Agent2Data (LLM→TestSuite) | 插入质量门禁 | `assertion_gate.py` |
|
||
| *之前无此步骤* | HINA 类型检测 → 策略选择 | `hina_classifier.py`, `strategy_selector.py` |
|
||
| DataWriter | 不修改 | — |
|
||
| CobolRunner | 不修改 | — |
|
||
| JavaRunner | 不修改 | — |
|
||
| Comparator → align_records → compare_field | 不修改 | — |
|
||
| *之前无此步骤* | 覆盖率收集(调用 coverage.py) | `coverage_collector.py` |
|
||
| Agent3Diagnostic | 不修改 | — |
|
||
| ReportGenerator | 增加质量评分卡片 | 修改 `generator.py` |
|
||
| *调用层* | 分层重试包装 | `retry.py` |
|
||
|
||
---
|
||
|
||
## 6. 实施步骤
|
||
|
||
### Step 1: quality/ 目录创建(基础)
|
||
|
||
```
|
||
cobol-java-v3/
|
||
quality/
|
||
__init__.py
|
||
hina_classifier.py # HINA 类型分类
|
||
strategy_selector.py # 策略选择模板
|
||
coverage_collector.py # 覆盖率收集
|
||
tests/
|
||
test_quality/
|
||
test_hina_classifier.py
|
||
test_strategy_selector.py
|
||
```
|
||
|
||
**验收**: `python -m pytest tests/test_quality/` 通过
|
||
|
||
### Step 2: 断言质量门禁(核心)
|
||
|
||
```
|
||
quality/
|
||
assertion_gate.py # 门禁逻辑
|
||
scorer.py # 评分器
|
||
```
|
||
|
||
**验收**: 能对一个 TestSuite 返回详细的 check_results
|
||
|
||
### Step 3: 集成到 orchestrator(最小修改)
|
||
|
||
修改 `orchestrator.py` 第 43 行附近 + `config.py` + `report/generator.py`
|
||
|
||
**验收**: 运行 `main.py` 能在 HTML 报告中看到质量评分卡片
|
||
|
||
### Step 4: 分层重试
|
||
|
||
```
|
||
quality/
|
||
retry.py
|
||
```
|
||
|
||
修改 `main.py` 调用方式
|
||
|
||
**验收**: 编译失败会自动重试,重试 3 次仍失败则标记 FATAL
|
||
|
||
---
|
||
|
||
## 7. 不修改的部分(明确边界)
|
||
|
||
| 组件 | 不修改的原因 |
|
||
|:----|:-----------|
|
||
| `cobol_testgen/*`(5000 行) | 功能完整且独立,仅通过 API 调用 |
|
||
| `runners/*`(编译+运行) | 已稳定,改动风险高 |
|
||
| `comparator/*`(字段比对) | 比对逻辑正确,仅消费其结果 |
|
||
| `agents/agent1_parser.py` | COPYBOOK 解析已稳定 |
|
||
| `data/*`(FieldTree/TestCase 等) | 数据结构定义被多处依赖 |
|
||
| `storage/*` | 文件存储逻辑 |
|
||
| `web/*` | 前端 UI |
|