"""对抗性测试 — COBOL 匹配分类器的假阳性/假阴性攻击 COBOL 迁移专家设计的攻击面: - FP: 非匹配程序被误判为マッチング - FN: 真实匹配程序未被识别 - 边界: 注释关键词、旧式命名、多文件非匹配 - FN: 变量名不含 KEY 但结构是匹配程序 """ import re from pathlib import Path import pytest from cobol_testgen import extract_structure from hina.pipeline import classify_program from hina.classifier import detect_keyword FIXTURES = Path(__file__).parents[3] / "test-data" / "cobol" / "adversarial" # (filename, expect_matching, reason) ADVERSARIAL_TESTS = [ ("ADV-FALSE-KEY.cbl", False, "FP: WS-KEY variable but only simple ADD, should NOT trigger matching"), ("ADV-KEY-IN-COMMENT.cbl", False, "FP: KEY only in *> comments, should NOT trigger matching"), ("ADV-PREVKEY-FAKE.cbl", False, "FP: WS-PREV-KEY without matching logic, should NOT trigger"), ("ADV-OLD-SCHOOL.cbl", True, "FN: K01-KEY old-school naming, should detect matching"), ("ADV-TINY-MATCH.cbl", True, "FN: Minimal matching (1 file), should detect"), ("ADV-CALL-MATCH.cbl", False, "FP: CALL+WS-MAST-KEY, subprogram call should win"), ("ADV-ASCII-KEY.cbl", False, "FP: ASCII+WS-KEY, encoding conversion should win"), ("ADV-10FILES.cbl", False, "FP: 10 files no KEY comparison, should NOT trigger matching"), ] @pytest.mark.parametrize( "filename,expect_matching,reason", ADVERSARIAL_TESTS, ids=[t[0].replace('.cbl','') for t in ADVERSARIAL_TESTS], ) def test_adversarial(filename, expect_matching, reason): """Adversarial test: false positive / false negative check""" path = FIXTURES / filename assert path.exists(), f"Missing: {path}" src = path.read_text("utf-8") struct = extract_structure(src) assert struct is not None result = classify_program(src) assert result is not None assert result["confidence"] >= 0 is_matching = "マッチング" in result["category"] or "二段階" in result["category"] if expect_matching: assert is_matching, ( f"{filename}: expected MATCHING but got '{result['category']}' " f"(conf={result['confidence']:.2f}). Reason: {reason}" ) else: assert not is_matching, ( f"{filename}: expected NON-MATCHING but got '{result['category']}' " f"(conf={result['confidence']:.2f}). Reason: {reason}" ) kw = detect_keyword(src) if expect_matching: assert len(kw) >= 1 or result["method"] != "rule_engine_fallback", ( f"{filename}: matching program with 0 keyword matches" ) def test_structural_matching_no_keyword(): """FN: Matching program without KEY in variable names (CUST-CODE vs ORDR-CODE) Real-world COBOL matching programs often use -CODE or -ID instead of -KEY. Structural detection must catch these even without naming hints. """ src = """ IDENTIFICATION DIVISION. PROGRAM-ID. REALMT. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. SELECT CUST-FILE ASSIGN TO 'CUST.DAT'. SELECT ORDR-FILE ASSIGN TO 'ORDR.DAT'. DATA DIVISION. FILE SECTION. FD CUST-FILE. 01 CUST-REC. 05 CUST-CODE PIC X(10). 05 CUST-NAME PIC X(30). FD ORDR-FILE. 01 ORDR-REC. 05 ORDR-CODE PIC X(10). 05 ORDR-AMT PIC 9(7)V99. WORKING-STORAGE SECTION. 01 WS-CUST-CODE PIC X(10). 01 WS-ORDR-CODE PIC X(10). 01 WS-EOF1 PIC X VALUE 'N'. 01 WS-EOF2 PIC X VALUE 'N'. PROCEDURE DIVISION. MAIN. OPEN INPUT CUST-FILE ORDR-FILE. READ CUST-FILE INTO CUST-REC AT END MOVE 'Y' TO WS-EOF1. READ ORDR-FILE INTO ORDR-REC AT END MOVE 'Y' TO WS-EOF2. PERFORM UNTIL WS-EOF1 = 'Y' OR WS-EOF2 = 'Y' IF CUST-CODE = ORDR-CODE DISPLAY 'MATCH' ELSE IF CUST-CODE < ORDR-CODE READ CUST-FILE AT END MOVE 'Y' TO WS-EOF1 ELSE READ ORDR-FILE AT END MOVE 'Y' TO WS-EOF2 END-IF END-PERFORM. CLOSE CUST-FILE ORDR-FILE. STOP RUN. """ result = classify_program(src) kw = detect_keyword(src) # Must have structural matching keyword assert any("structural" in k[2] for k in kw), ( f"Expected structural matching keyword, got {kw}" ) # Must be classified as matching assert "マッチング" in result["category"] or "二段階" in result["category"], ( f"Expected matching, got '{result['category']}'" ) # Confidence should be reasonable assert result["confidence"] > 0.30, ( f"Confidence too low: {result['confidence']:.2f}" )