50995d3335
- SETUP.md: 完整环境搭建指南(同事用) - SETUP_QUICK.md: 快速搭环境(4步) - s22~s26: TNA端到端、覆盖率报告、回归检查 - procedure_grammar.lark: 实验性Lark语法 Co-Authored-By: Claude <noreply@anthropic.com>
365 lines
9.8 KiB
Markdown
365 lines
9.8 KiB
Markdown
# COBOL Test Data Generator — 环境搭建与运行指南
|
||
|
||
## 1. 系统概述
|
||
|
||
COBOL 测试数据生成器(cobol-java-v3)是一个 Python 工具链,用于解析 COBOL 程序、提取控制流结构、生成覆盖所有分支的测试数据,并输出为固定的 flat file 格式供 GnuCOBOL 编译运行。
|
||
|
||
### 核心能力
|
||
|
||
| 能力 | 说明 |
|
||
|------|------|
|
||
| 解析 COBOL DATA DIVISION | Lark 语法 (Earley parser) → 字段定义 |
|
||
| 解析 COBOL PROCEDURE DIVISION | 行级状态机 → 决策点树 |
|
||
| 分支覆盖数据生成 | 每决策点生成 True/False 路径 → 记录 |
|
||
| Flat file 输出 | COBOL 固定长度二进制文件 |
|
||
| GnuCOBOL 编译运行 | 测试数据 → cobc 编译 → 运行验证 |
|
||
|
||
---
|
||
|
||
## 2. 必要条件
|
||
|
||
### 2.1 硬件要求
|
||
|
||
| 项目 | 最低 | 推荐 |
|
||
|------|------|------|
|
||
| CPU | 2 cores | 4+ cores |
|
||
| 内存 | 4 GB | 8 GB |
|
||
| 磁盘 | 500 MB | 2 GB |
|
||
| OS | Windows 10/11 64-bit | Windows 11 |
|
||
|
||
### 2.2 软件要求
|
||
|
||
| 软件 | 版本 | 用途 |
|
||
|------|------|------|
|
||
| **Python** | 3.12+ | 运行测试数据生成器 |
|
||
| **GnuCOBOL (cobc)** | 3.2.0 | 编译 COBOL 程序 & 运行时验证 |
|
||
| **Git** | 任意 | 拉取代码 |
|
||
|
||
### 2.3 Python 依赖
|
||
|
||
```
|
||
lark>=1.1.0 # Lark Earley parser (DATA DIVISION 解析)
|
||
pathlib>=1.0.1 # 路径处理
|
||
```
|
||
|
||
安装命令:
|
||
```bash
|
||
pip install lark pathlib
|
||
```
|
||
|
||
### 2.4 GnuCOBOL 安装
|
||
|
||
GnuCOBOL 3.2.0 (OpenCOBOL) 需要单独安装。
|
||
|
||
**下载**:
|
||
- GnuCOBOL 3.2 Windows 二进制包
|
||
- 推荐: GC32-BDB-SP1 版本(含 DB2/SQLite 支持)
|
||
|
||
**安装后确认**:
|
||
```bash
|
||
cobc --version
|
||
# 输出示例: cobc (GnuCOBOL) 3.2.0
|
||
```
|
||
|
||
**环境变量**:
|
||
```bash
|
||
# cobc 需要在 PATH 中
|
||
# 典型路径: C:\GnuCOBOL\bin
|
||
# 或自定义安装路径
|
||
|
||
# COB_LIBRARY_PATH 用于运行时定位 DLL(SHARED 编译的子程序)
|
||
# 如: set COB_LIBRARY_PATH=D:\cobol-java\cobol-tna-system\bin
|
||
```
|
||
|
||
---
|
||
|
||
## 3. 环境搭建步骤
|
||
|
||
### 3.1 安装 Python 3.12+
|
||
|
||
```bash
|
||
# 下载: https://www.python.org/downloads/
|
||
# 安装时勾选 "Add Python to PATH"
|
||
python --version
|
||
# Python 3.12.x
|
||
|
||
pip install lark pathlib
|
||
```
|
||
|
||
### 3.2 安装 GnuCOBOL 3.2
|
||
|
||
1. 下载 GC32-BDB-SP1 包
|
||
2. 解压到 `D:\360安全浏览器下载\GC32-BDB-SP1-rename-7z-to-exe\`
|
||
3. 将 `bin\` 子目录添加到系统 PATH
|
||
4. 验证:
|
||
```bash
|
||
cobc --version
|
||
# cobc (GnuCOBOL) 3.2.0
|
||
```
|
||
|
||
### 3.3 克隆代码
|
||
|
||
```bash
|
||
cd D:\
|
||
git clone https://gittea.dev/hangshuo652/cobol-java-v3.git
|
||
# 或从已有仓库拉取
|
||
cd D:\cobol-java\cobol-java-v3
|
||
git pull
|
||
```
|
||
|
||
### 3.4 验证安装
|
||
|
||
```bash
|
||
cd D:\cobol-java\cobol-java-v3
|
||
python -c "from cobol_testgen import extract_structure; print('OK')"
|
||
# 输出: OK
|
||
```
|
||
|
||
---
|
||
|
||
## 4. 目录结构
|
||
|
||
```
|
||
cobol-java-v3/
|
||
├── cobol_testgen/ # 核心代码
|
||
│ ├── __init__.py # 公开 API (extract_structure, generate_data)
|
||
│ ├── read.py # 预处理器 + DATA DIVISION 解析
|
||
│ ├── core.py # 旧 PROCEDURE DIVISION 解析器 (BrParser)
|
||
│ ├── cond.py # 条件解析器
|
||
│ ├── coverage.py # 覆盖率统计
|
||
│ ├── design_mcdc.py # 线性路径枚举 (O(N) 替代 O(2^N))
|
||
│ ├── pipeline_bridge.py # 新旧解析器桥接层
|
||
│ ├── procedure_parser.py # 新 PROCEDURE DIVISION 解析器
|
||
│ ├── flatfile.py # Flat file 写入器
|
||
│ ├── design.py # 值生成 + 约束应用
|
||
│ ├── models.py # 数据模型 (BrSeq, BrIf, BrEval...)
|
||
│ ├── grammar.lark # DATA DIVISION Lark 语法
|
||
│ └── procedure_grammar.lark # PROCEDURE DIVISION Lark 语法 (实验性)
|
||
├── test-data/ # 测试套件
|
||
│ ├── s15_coverage_verification.py # 基础覆盖率验证 (8种控制结构)
|
||
│ ├── s19_final_bridge_test.py # 桥接器验证
|
||
│ ├── s21_cond_fix_verify.py # 条件解析验证
|
||
│ ├── s25_per_program_report.py # 每程序详细报告
|
||
│ └── s26_regression_check.py # 回归检查
|
||
├── SETUP.md # 本文件
|
||
└── docs/ # 设计文档
|
||
```
|
||
|
||
---
|
||
|
||
## 5. 运行测试
|
||
|
||
### 5.1 快速验证(10 秒)
|
||
|
||
```bash
|
||
cd D:\cobol-java\cobol-java-v3
|
||
python test-data/s15_coverage_verification.py
|
||
```
|
||
|
||
期望输出:
|
||
```
|
||
S15: 17 PASS / 0 FAIL
|
||
```
|
||
|
||
### 5.2 完整 43 程序覆盖率报告(2-3 分钟)
|
||
|
||
```bash
|
||
python test-data/s25_per_program_report.py
|
||
```
|
||
|
||
期望输出末尾:
|
||
```
|
||
100%: 43 programs
|
||
TOTAL 3178 3178 100%
|
||
```
|
||
|
||
### 5.3 回归快速检查(2 分钟)
|
||
|
||
```bash
|
||
python test-data/s26_regression_check.py
|
||
```
|
||
|
||
期望输出:
|
||
```
|
||
Total: 3178/3178 = 100.00%
|
||
ALL 43/43 AT 100% — NO REGRESSIONS
|
||
```
|
||
|
||
### 5.4 指定 COPYBOOK 目录
|
||
|
||
如果 COBOL 程序依赖 COPYBOOK,需要在调用 `generate_data` 时指定 `copybook_dirs`:
|
||
|
||
```python
|
||
from cobol_testgen import extract_structure, generate_data
|
||
|
||
src = open("program.cbl", encoding="utf-8").read()
|
||
st = extract_structure(src)
|
||
recs = generate_data(src, st, copybook_dirs=["path/to/copybooks"])
|
||
```
|
||
|
||
---
|
||
|
||
## 6. 关键 API
|
||
|
||
### 6.1 extract_structure(cobol_source)
|
||
|
||
**输入**: COBOL 程序源码文本
|
||
**返回**: dict — 包含总分支数、决策点列表、分支树对象等
|
||
|
||
```python
|
||
st = extract_structure(src)
|
||
branches = st["total_branches"] # 总分支数
|
||
dps = st["decision_points"] # 决策点列表
|
||
tree = st["branch_tree_obj"] # 分支树对象
|
||
```
|
||
|
||
### 6.2 generate_data(cobol_source, structure, copybook_dirs=None)
|
||
|
||
**输入**:
|
||
- `cobol_source`: COBOL 程序原始源码(未预处理)
|
||
- `structure`: extract_structure 返回的 dict
|
||
- `copybook_dirs`: COPYBOOK 搜索路径列表(可选)
|
||
|
||
**返回**: list[dict] — 每条记录包含所有字段的值
|
||
|
||
```python
|
||
recs = generate_data(src, st)
|
||
# 或带 COPYBOOK 目录
|
||
recs = generate_data(src, st, copybook_dirs=["./cpy", "../common/copybooks"])
|
||
```
|
||
|
||
### 6.3 覆盖率数据
|
||
|
||
`generate_data` 执行后,`structure` 对象包含 `coverage` 键:
|
||
|
||
```python
|
||
cov = st["coverage"]
|
||
total = cov["total"] # 总分支数
|
||
covered = cov["covered"] # 覆盖分支数
|
||
pct = cov["pct"] # 覆盖率百分比
|
||
dps = cov["decision_points"] # 各决策点明细
|
||
```
|
||
|
||
---
|
||
|
||
## 7. 运行条件明细(同事配置检查清单)
|
||
|
||
### 必须满足
|
||
|
||
- [ ] Python 3.12+ 已安装,在 PATH 中
|
||
- [ ] `pip install lark` 执行成功
|
||
- [ ] GnuCOBOL (cobc) 3.2.0 已安装,在 PATH 中
|
||
- [ ] `cobc --version` 输出正常
|
||
- [ ] 无防火墙阻止 `gittea.dev` 的 git 访问
|
||
- [ ] `D:\` 盘有至少 500MB 空闲
|
||
|
||
### 如果使用 GnuCOBOL 编译运行
|
||
|
||
- [ ] `cobc` 命令可用(`which cobc` 或 `where cobc`)
|
||
- [ ] 子程序 DLL 路径在 `COB_LIBRARY_PATH` 环境中
|
||
- [ ] EXEC SQL 需要 SQLite3 支持(GC32-BDB-SP1 版本含)
|
||
|
||
### 常见问题
|
||
|
||
| 问题 | 原因 | 解决 |
|
||
|------|------|------|
|
||
| `ModuleNotFoundError: No module named 'lark'` | 缺少 Lark | `pip install lark` |
|
||
| `cobc: command not found` | GnuCOBOL 不在 PATH | 添加 `bin\` 到 PATH |
|
||
| `Errno 13 Permission denied` | 文件权限 | 以管理员运行或修改文件权限 |
|
||
| `gbk codec can't decode byte` | 编码问题 | 设置 `PYTHONIOENCODING=utf-8` |
|
||
| `name 'pp_str' is not defined` | 报告脚本 Bug | 已修复,git pull 最新代码 |
|
||
| `EXEC SQL ... not supported` | 需要 DB2/SQLite | 用 GC32-BDB-SP1 版本 GnuCOBOL |
|
||
|
||
---
|
||
|
||
## 8. 测试基准程序说明
|
||
|
||
系统包含两套测试基准程序:
|
||
|
||
### 电信计费系统 (37 程序)
|
||
|
||
```
|
||
路径: D:\cobol-java\cobol-test-programs/
|
||
COPYBOOK: common/copybooks/
|
||
类型: Matching / KeyBreak / Division / CSV / Sort 等
|
||
```
|
||
|
||
### 勤怠管理系统 (6 程序)
|
||
|
||
```
|
||
路径: D:\cobol-java\cobol-tna-system/
|
||
COPYBOOK: cpy/
|
||
子程序: sub/*.cbl → bin/*.dll
|
||
类型: 日企勤怠管理 (打工统计)
|
||
EXEC SQL: ZAN06UPD 需要 SQLite3 支持
|
||
```
|
||
|
||
---
|
||
|
||
## 9. 快速启动脚本
|
||
|
||
### Windows (batch)
|
||
|
||
```batch
|
||
@echo off
|
||
cd /d D:\cobol-java\cobol-java-v3
|
||
echo === COBOL Test Data Generator ===
|
||
echo [1/3] Checking dependencies...
|
||
python -c "import lark" 2>nul || pip install lark
|
||
echo [2/3] Running regression test...
|
||
python test-data\s15_coverage_verification.py
|
||
if %errorlevel% neq 0 echo FAILED && exit /b 1
|
||
echo [3/3] Running full coverage report...
|
||
set PYTHONIOENCODING=utf-8
|
||
python test-data\s25_per_program_report.py
|
||
echo === DONE ===
|
||
```
|
||
|
||
### Linux/macOS
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
cd /path/to/cobol-java-v3
|
||
echo "=== COBOL Test Data Generator ==="
|
||
echo "[1/3] Checking dependencies..."
|
||
python3 -c "import lark" 2>/dev/null || pip3 install lark
|
||
echo "[2/3] Running regression test..."
|
||
python3 test-data/s15_coverage_verification.py
|
||
if [ $? -ne 0 ]; then echo "FAILED"; exit 1; fi
|
||
echo "[3/3] Running full coverage report..."
|
||
PYTHONIOENCODING=utf-8 python3 test-data/s25_per_program_report.py
|
||
echo "=== DONE ==="
|
||
```
|
||
|
||
---
|
||
|
||
## 10. 版本信息
|
||
|
||
| 版本 | 日期 | 说明 |
|
||
|:----:|:----:|------|
|
||
| v3.0 | 2026-06-25 | 当前版本。43/43 程序 100% 分支覆盖 |
|
||
| v2.0 | 2026-06-20 | 新 PROCEDURE DIVISION 解析器 + 线性路径枚举 |
|
||
| v1.0 | 2026-06-14 | 初始版本,BrParser regex 解析器 |
|
||
|
||
---
|
||
|
||
## 附录:覆盖率数据验证方法
|
||
|
||
系统使用三层验证确保覆盖率数据真实:
|
||
|
||
1. **S15 测试**: 8 个手动构建的 COBOL 片段,每个决策点的手工分支数与系统检测数逐一对比
|
||
2. **所有约束通过 _match_constraint 精确匹配**:约束侧和解析侧的字段名都会去掉下标后再比较
|
||
3. **无条件 fallback 已全部移除**:没有 "任何路径到达就标记全部" 的逻辑
|
||
|
||
```python
|
||
# coverage.py 中 _mark_if 的真实覆盖逻辑(无 fallback):
|
||
def _mark_if(dp, cons):
|
||
# 只有约束侧字段名 == 解析侧字段名时标记覆盖
|
||
# 加了防御性下标剥离
|
||
if _match_constraint(c, simple):
|
||
dp.active_branches.add('T' if c[3] else 'F')
|
||
elif _match_constraint(c, inv_simple):
|
||
dp.active_branches.add('F')
|
||
# 没有任何 else + unconditional add
|
||
```
|