fix: classification修复+grammar增强+75/75回归确认

分类修复: - FILE-CONTROL关键词(0.99)错误覆盖匹配检测信号 - 添加匹配型规则引擎更优优先级，确保匹配检测结果优先 - has_matching_kw特征注入，使IF-less匹配程序也能识别 Grammar增强: - LEVEL扩展到/[0-9]+/覆盖所有COBOL层级号 - HEX_STRING添加支持X'...'十六进制字面量 - VALUE子句逗号预处理剥离(88-level多值) - COPY正则支持引号包覆的名称结果: 内部75/75, 外部基准54/58(93%) Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-22 13:18:07 +08:00
parent 3b150b6c54
commit bb4a7a2346
4 changed files with 46 additions and 15 deletions
@@ -38,6 +38,11 @@ def preprocess(source: str) -> str:
        source, flags=re.IGNORECASE | re.DOTALL
    )

+    # Strip commas from VALUE clauses (VALUE 'A', 'B', 'C' → VALUE 'A' 'B' 'C')
+    def _strip_value_commas(m):
+        return re.sub(r'\s*,\s*', ' ', m.group(0))
+    source = re.sub(r'VALUE\s+[^.\n]+', _strip_value_commas, source, flags=re.IGNORECASE)
+
    fixed = _is_fixed_format(source)
    lines = []
    for raw_line in source.splitlines():