编程技术文章分享与教程

网站首页 > 技术文章 正文

HtmlSanitizer防XSS攻击基本用法(phpxss攻击防御)

hmc789 2024-11-15 19:33:48 技术文章 2 ℃

使用场景

HtmlSanitizer用于从可能导致XSS攻击的结构中清除HTML片段和文档。它使用AngleSharp来解析、操作和呈现HTML和CSS。

因为HtmlSanitizer基于强大的HTML解析器,它还可以保护您免受蓄意或意外的“标记中毒”,即一个片段中的无效HTML可能会损坏整个文档,导致布局或样式损坏。

为了方便不同的用例,HtmlSanitizer可以在几个级别进行自定义:

  • 通过属性AllowedTags配置允许的HTML标记。所有其他标签都将被剥离。
  • 通过属性AllowedAttributes配置允许的HTML属性。所有其他属性都将被剥离。
  • 通过属性AllowedCssProperties配置允许的CSS属性名称。所有其他样式都将被剥去。
  • 通过属性AllowedAtRules在规则处配置允许的CSS。所有其他at规则都将被取消。
  • 通过属性AllowedSchemes配置允许的URI方案。所有其他URI都将被剥离。
  • 通过属性UriAttributes配置包含URI的HTML属性(如“src”、“href”等)。
  • 提供一个将用于解析相对URI的基本URI。
  • 在移除标记、属性或样式之前,将引发可取消的事件。

用法

安装HtmlSanitizer NuGet包。然后:

using Ganss.Xss;
//创建HtmlSanitizer实例
var sanitizer =new HtmlSanitizer();
//配置允许的标签、属性等
sanitizer.AllowedTags.Add("a");
//清理html内容
string htmlContent = "<script>document.write('xss')</script><a href=''>你好</a><a href='#'>测试</a>";
var sanitizerHtml=sanitizer.Sanitize(htmlContent,"http://www.baidu.com");
//打印清洗后的html内容
Console.WriteLine(sanitizerHtml);

默认情况下允许的标记

a, abbr, acronym, address, area, article, aside, b, bdi, big, blockquote, body, br, button, caption, center, cite, code, col, colgroup, data, datalist, dd, del, details, dfn, dir, div, dl, dt, em, fieldset, figcaption, figure, font, footer, form, h1, h2, h3, h4, h5, h6, head, header, hr, html, i, img, input, ins, kbd, keygen, label, legend, li, main, map, mark, menu, menuitem, meter, nav, ol, optgroup, option, output, p, pre, progress, q, rp, rt, ruby, s, samp, section, select, small, span, strike, strong, sub, summary, sup, table, tbody, td, textarea, tfoot, th, thead, time, tr, tt, u, ul, var, wbr

默认情况下允许的属性

abbr, accept-charset, accept, accesskey, action, align, alt, autocomplete, autosave, axis, bgcolor, border, cellpadding, cellspacing, challenge, char, charoff, charset, checked, cite, clear, color, cols, colspan, compact, contenteditable, coords, datetime, dir, disabled, draggable, dropzone, enctype, for, frame, headers, height, high, href, hreflang, hspace, ismap, keytype, label, lang, list, longdesc, low, max, maxlength, media, method, min, multiple, name, nohref, noshade, novalidate, nowrap, open, optimum, pattern, placeholder, prompt, pubdate, radiogroup, readonly, rel, required, rev, reversed, rows, rowspan, rules, scope, selected, shape, size, span, spellcheck, src, start, step, style, summary, tabindex, target, title, type, usemap, valign, value, vspace, width, wrap

注意:为了防止类劫持和干扰要集成已清理片段的类,默认情况下不允许使用class属性。可以按如下方式添加:

var sanitizer = new HtmlSanitizer();
sanitizer.AllowedAttributes.Add("class");
var sanitized = sanitizer.Sanitize(html);

默认情况下允许的CSS属性

align-content, align-items, align-self, all, animation, animation-delay, animation-direction, animation-duration, animation-fill-mode, animation-iteration-count, animation-name, animation-play-state, animation-timing-function, backface-visibility, background, background-attachment, background-blend-mode, background-clip, background-color, background-image, background-origin, background-position, background-position-x, background-position-y, background-repeat, background-repeat-x, background-repeat-y, background-size, border, border-bottom, border-bottom-color, border-bottom-left-radius, border-bottom-right-radius, border-bottom-style, border-bottom-width, border-collapse, border-color, border-image, border-image-outset, border-image-repeat, border-image-slice, border-image-source, border-image-width, border-left, border-left-color, border-left-style, border-left-width, border-radius, border-right, border-right-color, border-right-style, border-right-width, border-spacing, border-style, border-top, border-top-color, border-top-left-radius, border-top-right-radius, border-top-style, border-top-width, border-width, bottom, box-decoration-break, box-shadow, box-sizing, break-after, break-before, break-inside, caption-side, caret-color, clear, clip, color, column-count, column-fill, column-gap, column-rule, column-rule-color, column-rule-style, column-rule-width, column-span, column-width, columns, content, counter-increment, counter-reset, cursor, direction, display, empty-cells, filter, flex, flex-basis, flex-direction, flex-flow, flex-grow, flex-shrink, flex-wrap, float, font, font-family, font-feature-settings, font-kerning, font-language-override, font-size, font-size-adjust, font-stretch, font-style, font-synthesis, font-variant, font-variant-alternates, font-variant-caps, font-variant-east-asian, font-variant-ligatures, font-variant-numeric, font-variant-position, font-weight, gap, grid, grid-area, grid-auto-columns, grid-auto-flow, grid-auto-rows, grid-column, grid-column-end, grid-column-gap, grid-column-start, grid-gap, grid-row, grid-row-end, grid-row-gap, grid-row-start, grid-template, grid-template-areas, grid-template-columns, grid-template-rows, hanging-punctuation, height, hyphens, image-rendering, isolation, justify-content, left, letter-spacing, line-break, line-height, list-style, list-style-image, list-style-position, list-style-type, margin, margin-bottom, margin-left, margin-right, margin-top, mask, mask-clip, mask-composite, mask-image, mask-mode, mask-origin, mask-position, mask-repeat, mask-size, mask-type, max-height, max-width, min-height, min-width, mix-blend-mode, object-fit, object-position, opacity, order, orphans, outline, outline-color, outline-offset, outline-style, outline-width, overflow, overflow-wrap, overflow-x, overflow-y, padding, padding-bottom, padding-left, padding-right, padding-top, page-break-after, page-break-before, page-break-inside, perspective, perspective-origin, pointer-events, position, quotes, resize, right, row-gap, scroll-behavior, tab-size, table-layout, text-align, text-align-last, text-combine-upright, text-decoration, text-decoration-color, text-decoration-line, text-decoration-skip, text-decoration-style, text-indent, text-justify, text-orientation, text-overflow, text-shadow, text-transform, text-underline-position, top, transform, transform-origin, transform-style, transition, transition-delay, transition-duration, transition-property, transition-timing-function, unicode-bidi, user-select, vertical-align, visibility, white-space, widows, width, word-break, word-spacing, word-wrap, writing-mode, z-index

默认情况下允许的规则处的CSS

namespace, style

style指的是@media等其他at规则中的样式声明。在允许其他类型的at规则的同时不允许@namespace可能会导致错误。@font face和@viewport中的属性声明不会被清除。

注意:默认情况下不允许使用样式标记。

默认情况下允许的URI方案

http,https

注意:默认情况下允许协议相关URL(例如//app)(其他相关URL也是如此)。

要允许mailto:链接:

sanitizer.AllowedSchemes.Add("mailto");

包含URI的默认属性

action, background, dynsrc, href, lowsrc, src

线程安全

Sanitize() 和SanitizeDocument()方法是线程安全的,即您可以在不同线程的单个共享实例上使用这些方法,前提是您不同时设置实例或静态属性。一个典型的用例是从一个线程准备一次HtmlSanitizer实例(即设置所需的属性,如AllowedTags等),然后从多个线程调用Sanitize()/SaniitizeDocument()。

文本内容不一定保持原样

请注意,由于输入由AngleSharp的HTML解析器解析,然后呈现出来,因此即使没有删除任何元素或属性,也不能期望文本内容与输入时完全相同。示例:

  • 4<5变为4<;5.
  • <SPAN>测试</p>变为<SPAN>测试<p></p></span>
  • 测试</span>变成测试</span>

另一方面,尽管解析器修复了一些损坏的HTML,但输出可能仍然包含无效的HTML。示例:

  • <div><li>测试</li></div>
  • <ul><br><li>测试</li></ul>
  • <h3><p>测试</p></h3>
标签列表
最新留言