fix: classification修复+grammar增强+75/75回归确认
分类修复: - FILE-CONTROL关键词(0.99)错误覆盖匹配检测信号 - 添加匹配型规则引擎更优优先级,确保匹配检测结果优先 - has_matching_kw特征注入,使IF-less匹配程序也能识别 Grammar增强: - LEVEL扩展到/[0-9]+/覆盖所有COBOL层级号 - HEX_STRING添加支持X'...'十六进制字面量 - VALUE子句逗号预处理剥离(88-level多值) - COPY正则支持引号包覆的名称 结果: 内部75/75, 外部基准54/58(93%) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -38,6 +38,11 @@ def preprocess(source: str) -> str:
|
||||
source, flags=re.IGNORECASE | re.DOTALL
|
||||
)
|
||||
|
||||
# Strip commas from VALUE clauses (VALUE 'A', 'B', 'C' → VALUE 'A' 'B' 'C')
|
||||
def _strip_value_commas(m):
|
||||
return re.sub(r'\s*,\s*', ' ', m.group(0))
|
||||
source = re.sub(r'VALUE\s+[^.\n]+', _strip_value_commas, source, flags=re.IGNORECASE)
|
||||
|
||||
fixed = _is_fixed_format(source)
|
||||
lines = []
|
||||
for raw_line in source.splitlines():
|
||||
|
||||
Reference in New Issue
Block a user