fix: HINA 全类型缺陷修复 — SORT/CSV/ALT 3 个真实缺陷
对抗性全类型测试发现的缺陷和修复: 缺陷1: SORT/MERGE L1 关键词太严格(漏检) - 旧: 'SORT ON KEY' / 'MERGE ON KEY'(精确字符串) - COBOL 中的真实写法: SORT WORK-FILE ON ASCENDING KEY ... - 新: 正则 SORT(?:\s+\S+)?\s+ON\s+(?:ASCENDING|DESCENDING)?KEY 缺陷2: CSV 假阳性(STRING/INSPECT 非CSV也触发) - 旧: has_string=True -> CSV合并 - 新: 要求 has_csv_merge(STRING+逗号分隔) - 单纯字符串拼接不再触发 CSV 分类 缺陷3: ALTERNATE RECORD KEY 被 ORGANIZATION IS 覆盖 - 旧: 文件编成先于替代索引(同确信度先者胜) - 新: 替代索引放前面(更具体的分类优先) 回归: 767 passed(0 new failures)
This commit is contained in:
@@ -106,21 +106,33 @@ def resolve_csv_merge_vs_split(features: dict) -> dict:
|
||||
"""区分 CSV 合并与拆分。
|
||||
|
||||
规则:
|
||||
- STRING 语句存在 → 无换行 (合并, merge)
|
||||
- INSPECT REPLACING 存在 → 有换行 (拆分, split)
|
||||
- STRING 存在且含逗号分隔 → 无换行 (合并, merge)
|
||||
- INSPECT REPLACING 含逗号/改行 → 有换行 (拆分, split)
|
||||
单纯的 STRING 拼接/INSPECT 计数不触发(容易假阳性)。
|
||||
"""
|
||||
has_string = features.get("has_string", False)
|
||||
has_inspect = features.get("has_inspect", False)
|
||||
has_csv_merge = features.get("has_csv_merge", False) # 从源码注入
|
||||
has_csv_split = features.get("has_csv_split", False) # 从源码注入
|
||||
evidence: list[str] = []
|
||||
|
||||
if has_string:
|
||||
evidence.append("STRING 语句存在 → CSV 合并 (无换行)")
|
||||
if has_csv_merge:
|
||||
evidence.append("STRING + 逗号分隔 → CSV 合并 (无换行)")
|
||||
return {"resolved_type": "CSV合并", "confidence": 0.85, "evidence": evidence}
|
||||
|
||||
if has_inspect:
|
||||
evidence.append("INSPECT REPLACING 存在 → CSV 拆分 (有换行)")
|
||||
if has_csv_split:
|
||||
evidence.append("INSPECT REPLACING 含逗号/改行 → CSV 拆分")
|
||||
return {"resolved_type": "CSV拆分", "confidence": 0.85, "evidence": evidence}
|
||||
|
||||
# 兼容旧版:
|
||||
if has_string:
|
||||
evidence.append("STRING 存在但无逗号分隔 → 非CSV(低确信度)")
|
||||
return {"resolved_type": "unknown", "confidence": 0.0, "evidence": evidence}
|
||||
|
||||
if has_inspect:
|
||||
evidence.append("INSPECT 存在但无逗号/改行 → 非CSV(低确信度)")
|
||||
return {"resolved_type": "unknown", "confidence": 0.0, "evidence": evidence}
|
||||
|
||||
evidence.append("既无 STRING 也无 INSPECT REPLACING")
|
||||
return {"resolved_type": "unknown", "confidence": 0.0, "evidence": evidence}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user